Statistical Analysis of Clusters of Extreme Events

POUR L'OBTENTION DU GRADE DE DOCTEUR S SCIENCES
accepte sur proposition du jury:

Prof. S. Morgenthaler, prsident du jury
Prof. A. C. Davison, directeur de thse
Dr Ph. Naveau, rapporteur
Prof. V. Panaretos, rapporteur
Prof. J. Tawn, rapporteur
Statistical Analysis of Clusters of Extreme Events
THSE N
O
4312 (2009)
COLE POLYTECHNIQUE FDRALE DE LAUSANNE
PRSENTE LE 19 MAI 2009
LA FACULT SCIENCES DE BASE
CHAIRE DE STATISTIQUE
PROGRAMME DOCTORAL EN MATHMATIQUES

Suisse
2009
PAR
Mria SVEGES
Contents
Acknowledgements 5
Version abregee 7
Abstract 9
1 Introduction 11
2 Background 15
2.1 Point process of exceedances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 The extremal types theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3 The point process approach for exceedances . . . . . . . . . . . . . . . . . 22
2.1.4 Interpretation of the extremal index . . . . . . . . . . . . . . . . . . . . . 26
2.1.5 Methods for the estimation of the extremal index . . . . . . . . . . . . . . 29
The blocks method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
The runs method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
The intervals method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
The two-threshold method . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Applicability of the methods in nonstandard situations . . . . . . . . . . . 34
2.2 Multivariate extreme values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1 Multivariate extremes in the independent case . . . . . . . . . . . . . . . 36
2.2.2 The multivariate extremal index . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.3 The M4 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Overview of the literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3
3 Univariate methods 49
3.1 Likelihood methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.1 The likelihood and model misspecication . . . . . . . . . . . . . . . . . . 50
3.1.2 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.3 Diagnostics for threshold and run parameter selection . . . . . . . . . . . 53
Misspecication tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Condition D
(K)
(u
n
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.4 The iterative least squares estimator . . . . . . . . . . . . . . . . . . . . . 56
3.1.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
The use of D
(K)
(u
n
) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
The use of misspecication tests . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.6 Local extremal index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Smoothing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Nonstationary simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 Data analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.1 The Central England temperature series . . . . . . . . . . . . . . . . . . . 67
3.2.2 The Neuchatel daily minimum temperatures . . . . . . . . . . . . . . . . . 71
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4 M3 modelling 87
4.1 M3 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.2 Preliminary cluster identication . . . . . . . . . . . . . . . . . . . . . . . 89
4.1.3 Dirichlet mixtures for the signatures . . . . . . . . . . . . . . . . . . . . . 90
4.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5 Multivariate methods 105
5.1 Pointwise methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.1 Estimation procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1.2 Simulated example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3 M4 modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3.1 M4 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3.2 Modications of the univariate method for the multivariate case . . . . . 118
5.3.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6 Discussion 133
A The GEV modelling of the Neuchatel temperature sequences 135
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A.2 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
A.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
A.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
B The EM algorithm 147
5
6
Acknowledgements
My rst warm acknowledgements go to my thesis advisor, Anthony Davison. Thanks to his
innite patience and exibility, these last ve years I was able to realize the almost-impossible,
the dream of every woman in science: to do research Im fascinated with and to live with my
kids the moments which were not occupied by the school. This is a gift almost beyond thanks.
Thank you for the ve years of fun and pleasure of research, for all the things I learned from
you, and for providing us the possibility of happy family life.
I wish to thank also my jury members Philippe Naveau, Jonathan Tawn and Victor Panare-
tos for reading carefully my thesis, providing an interesting discussion, and proposing small
corrections which improved much on the presentation of the results.
Nagyon halas vagyok edesanyamnak, akinel melegszv ubb, szeretettelibb lenyt elkepzelni sem
lehet. Bizalma es szeretete vegigksert engem ezen az ot even.
Koszonom anyosomnak, aposomnak es sogornomnek a nyarakat, amikor egy honapot egyed ul
tolthettem Richivel es a munkaval. Micsoda ajandek!
Neked koszonetet mondani, Richi, nincsenek szavak.
Marika
7
8
Version abregee
Cette thèse est une contribution à la statistique des valeurs extremes, plus precisement à
lestimation des groupes des valeurs extremes dans une serie temporelle. Une mesure de leur
tendance à se regrouper peut etre linverse de la taille moyenne dun groupe, qui est appele
indice extremal dans ce contexte. En plus de sa relation avec la taille des groupes des extremes,
lindice extremal apparat aussi comme le paramètre mesurant leet de la dependance tem-
porelle sur les quantiles extremes de la fonction de repartition des donnees. Bien quil existe
plusieurs methodes pour lestimer dans les series univariees, ces methodes ne sont adaptees que
pour des series strictement stationnaires satisfaisant une condition dindependance asymptotique
sur les niveaux extremes. Ils ne peuvent admettre des variables explicatives, et ne donnent que
des estimateurs grossiers pour les donnees multivariees. Ces caracteristiques sont très restric-
tives. Dans les series climatiques, le hypothèses de stationnarite et independance asymptotique
peuvent toutes deux etre rompues par le changement climatique et par lexistence possible de
memoire longue. Aussi, omettre linclusion des variables simultanees liees à la variable dinteret
peut entraner une perte decacite.
La thèse aborde ces questions. Premièrement, le theorème de Ferro and Segers (2003) con-
cernant la repartition des intervalles entre valeurs extremes sera etendu: nous introduisons les
intervalles tronques entre valeurs extremes, appeles les K-gaps, et montrons quils suivent la
meme loi que les intervalles non-tronques. Loptimisation de la fonction de vraisemblance con-
struite à partir de cette loi donne naissance à un estimateur connu analytiquement. La methode
permet lutilisation de variables explicatives et des techniques de lissage, ouvrant la possibilite de
son application pour les cas nonstationnaires. Nous illustrons cette methode à laide dexemples
simules et des jeux de donnees reelles.
La vraisemblance, denie comme independante pour toutes valeurs du paramètre K, est
mal speciee si K est trop petit. Ceci motive une autre contribution de la thèse, lintroduction
9
de tests de misspecication bases sur la matrice dinformation de Fisher. Pour notre fonction
de vraisemblance, ces tests sont capables de detecter une misspecication suite de dierentes
causes, pas seulement celle consequante à une mauvaise selection de K. Ils fournissent de
laide aussi dans la choix du seuil, et decelent les violations des conditions fondamentales de
stationnarite ou dindependance asymptotique. En outre, ces tests diagnostiques sont developpes
pour des modèles generaux, ils peuvent etre donc adaptes à dautres modèles de la statistique
des extremes, qui sont toujours approximatifs. La performance des tests dans le contexte des
groupes des valeurs extremes est illustree à laide de donnees simulees. Deux series reelles à
caractère nonstationnaire complexe demontrent leur utilite dans des situations o` u les hypothèses
fondamentales sont violees.
Dans le cas multivarie, le paramètre correspondant à lindice extremal est la fonction dindice
extremal multivarie. Comme dans le cas univarie, son apparition est liee à la dependance
temporelle du processus observe. Les methodes univariees peuvent etre appliquees, mais les
estimations resultantes sont grossières, elles varient excessivement, et ne satisfont pas aux con-
ditions sur la fonction dindice extremal multivarie. La troisième contribution de la thèse est le
developpement dune nouvelle methode fondee sur lapproximation M4 de Smith and Weissman
(1996), qui peut etre appliquee dans lestimation de la fonction dindice extremal multivarie
comme dans celle dautres caracteristiques des groupes de valeurs extremes. A ce but, nous
creons une procedure pour lidentication preliminaire des groupes des extremes, et modelisons
la distribution du bruit d u aux seuils nis par un modèle semi-parametrique, le melange des
distributions Dirichlet. Nous ajustons le modèle par lalgorithme EM. Les memes exemples
simulees et les memes donnees reelles illustrent la performance de la nouvelle methode.
Mots-cles: fonction dindice extremal multivarie, indice extremal, melange de distributions
de Dirichlet, misspecication, modèle M4, vraisemblance
10
Abstract
The thesis is a contribution to extreme-value statistics, more precisely to the estimation of clus-
tering characteristics of extreme values. One summary measure of the tendency to form groups
is the inverse average cluster size. In extreme-value context, this parameter is called the ex-
tremal index, and apart from its relation with the size of groups, it appears as an important
parameter measuring the eects of serial dependence on extreme levels in time series. Although
several methods exist for its estimation in univariate sequences, these methods are only appli-
cable for strictly stationary series satisfying a long-range asymptotic independence condition on
extreme levels, cannot take covariates into consideration, and yield only crude estimates for the
corresponding multivariate quantity. These are strong restrictions and great drawbacks. In cli-
matic time series, both stationarity and asymptotic independence can be broken, due to climate
change and possible long memory of the data, and not including information from simultaneously
measured linked variables may lead to inecient estimation.
The thesis addresses these issues. First, we extend the theorem of Ferro and Segers (2003)
concerning the distribution of inter-exceedance times: we introduce truncated inter-exceedance
times, called K-gaps, and show that they follow the same exponential-point mass mixture dis-
tribution as the inter-exceedance times. The maximization of the likelihood built on this dis-
tribution yields a simple closed-form estimator for the extremal index. The method can admit
covariates and can be applied with smoothing techniques, which allows its use in a nonstationary
setting. Simulated and real data examples demonstrate the smooth estimation of the extremal
index.
The likelihood, based on an assumption of independence of the K-gaps, is misspecied
whenever K is too small. This motivates another contribution of the thesis, the introduction
into extreme-value statistics of misspecication tests based on the information matrix. For our
likelihood, they are able to detect misspecication from any source, not only those due to a bad
11
choice of the truncation parameter. They provide help also in threshold selection, and show
whether the fundamental assumptions of stationarity or asymptotic independence are broken.
Moreover, these diagnostic tests are of general use, and could be adapted to many kinds of
extreme-value models, which are always approximate. Simulated examples demonstrate the
performance of the misspecication tests in the context of extremal index estimation. Two data
examples with complex behaviour, one univariate and the other bivariate, oer insight into their
power in discovering situations where the fundamental assumptions of the likelihood model are
not valid.
In the multivariate case, the parameter corresponding to the univariate extremal index is
the multivariate extremal index function. As in the univariate case, its appearance is linked to
serial dependence in the observed processes. Univariate estimation methods can be applied, but
are likely to give crude, unreasonably varying, estimates, and the constraints on the extremal
index function implied by the characteristics of the stable tail dependence function are not
automatically satised. The third contribution of the thesis is the development of methodology
based on the M4 approximation of Smith and Weissman (1996), which can be used to estimate
the multivariate extremal index, as well as other cluster characteristics. For this purpose, we
give a preliminary cluster selection procedure, and approximate the noise on nite levels with
a exible semiparametric model, the Dirichlet mixtures used widely in Bayesian analysis. The
model is tted by the EM algorithm. Advantages and drawbacks of the method are discussed
using the same univariate and bivariate examples as the likelihood methods.
Keywords: Dirichlet mixture, extremal index, likelihood, M4 model, misspecication, mul-
tivariate extremal index function
12
Chapter 1
Introduction
Extreme-value statistics has gained in interest during recent years. This branch of stochastic
modelling provides appropriate techniques to infer probabilities of specied unusual random
events, or to estimate the size of disasters that may be expected only rarely. Such events can
have powerful impact on human life. What sea level will be likely to be exceeded only once
during the next thousand years at a specied dyke on the shores of the Netherlands? How often
would a windstorm of the strength of Lothar sweep over a particular industrial site or large city?
How often can we expect heatwaves similar to that of the summer of 2003? How long will they
be? How high can the peak temperatures be?
Extreme-value statistics tries to provide quantitative answers to such questions, together with
a measure of uncertainty of the answers. It provides a wide variety of methods for the estimation
of sizes of extreme events of independent, identically distributed or stationary univariate or
multivariate time series. However, the sample of typical questions above shows that another
important aspect of the extreme events must also be dealt with: their tendency to cluster. A
heatwave, a long series of consecutive hot days and hot nights, can be dened as a group of
observations exceeding some high thresholds simultaneously in daily minimum and maximum
temperatures. Similarly, a windstorm is dened by unusually high wind speeds, measured at
various locations in a region and correlated in time. The length of the period for which sea
levels are high may be important for security. Thus, clustering of extreme events should be an
inherent part of the estimation of extremes.
There are at least two important issues to be considered when trying to assess climatic
extremes. The rst is that the extreme temperatures, winds or sea levels are not isolated, inde-
pendent events. They are obviously part of the climate, which is a complex system with many
13
CHAPTER 1. INTRODUCTION
variables linked together. Similarly to simple regression, we can expect better estimates and
forecasts of the variables of interest, if it can include information from other climatic variables
measured simultaneously. Part of the variation will be explained by the changes in the covari-
ates, so estimates will in general have less unexplained variability. Turning again to the three
examples, it is likely that extreme temperatures are related to specic atmospheric conditions
that can be described with a collection of simultaneous variables. Wind speeds may be corre-
lated with pressure and especially pressure gradients. Better estimates of extreme sea levels can
be achieved if based on joint models for wind speed and wave surge.
The second issue is that we live in a changing climate. This makes statistical inference more
dicult, since simple extrapolation of results based on past and present data is likely to give
invalid estimates for the future. Incidence of heatwaves or windstorms, or even conditions that
lead to extreme sea levels might change with global changes. In a changing climate, the clusters
of extreme events can also show variation with time, which should be possible to estimate.
The clustering of extremes from a univariate time series can be assessed using several meth-
ods, mainly useful for estimating the average cluster length. Other cluster characteristics, such
as the distribution of their length, the sizes of the extremes comprised or the total excess, are
generally estimated in an empirical way. But these methods are less adequate in a multivariate
case, and fail to answer any of the issues listed above. There is no way to incorporate additional
information from the other climatic variables, and we have only crude methods to deal with the
nonstable behaviour of the climate, so our estimates and forecasts regarding the clustering of
extreme events are still somewhat rudimentary.
This thesis tries to make a step towards solving these problems. Likelihood methods in
general are able to incorporate both time and covariates, and can be used in a parametric or
semiparametric way. We consider their use in the context of clustering of extreme events.
The distribution of the truncated time periods between extreme events provides a possibility
to construct likelihood methods for the estimation of the extremal index. This likelihood gives
rise to tests, based on the associated information matrix, which are able to detect problems with
the underlying assumptions, namely stationarity, independence of the sample or long-range
dependence of the data-generating process. As a byproduct, these tests turn out to help also
in a fundamental issue of extreme-value statistics, threshold selection. We give examples of
their use, and show that they are adapted to give rened pictures of real data that behave in
a complicated way. Extension to the multivariate case is also considered, and the problems of
14
their application and their performance presented.
We consider a model, the M4 process, which has the advantage of yielding a probabilistic
description of the entire trajectory around the extremes. It may therefore have even broader use
than the likelihood method based on the inter-exceedance times: rst, it directly yields estimates
of cluster statistics, and second, it does not break the links between the simultaneous variables
around extremes, thus oering a way to include physics-based information into statistics. In
this aspect, the M4 process is unique among the extreme-value models. We make here only a
rst assessment of the potential of the M4 approximation, answering the fundamental questions
whether the M4 clusters can be recognized in an observed process, and if so, whether they can
be used for statistical inference of the extremal index and extremal index function. Further
development of the method turns out to be a promising possibility, despite the many diculties.
Chapter 2 summarizes the background theory necessary to develop methods for the clustering
of extremes, Section 2.1 for univariate sequences, Section 2.2 for multivariate sequences and M4
processes. Section 2.3 contains also a short overview of the literature omitted from Sections 2.1
and 2.2.
Chapter 3 deals with univariate estimation of clustering: Section 3.1 introduces the likelihood
methods and the information matrix tests, and shows a few simulated examples, Section 3.2 adds
a data example for the smooth estimation of the extremal index and another for the use of the
information matrix tests.
Chapter 4 considers the possible use of the M4 methods in the univariate case. Section 4.3
summarizes the results obtained with the two methods in univariate estimation.
Chapter 5 presents the two methods in the multivariate context: Section 5.1 discusses the
likelihood method, with a data example given in Section 5.2, and Section 5.3 deals with the M4
approximation. The analysis of the same data set is performed with the M4 approximation as
well, and is presented in Section 5.4. A summary of the results obtained with the two methods
can be found in Section 5.5. Chapter 6 discusses the new methods and their use for investigating
the climate.
Appendix A contains an analysis of one of the data examples, the Neuchatel daily minimum
temperatures from a dierent point of view. Since this is not directly related to the main work
presented in the thesis, and the results are more interesting from the point of view of climate
statistics than from statistical methodology, it does not form an essential part of the thesis.
However, the results provide part of the motivation to survey the data with special regard to
15
CHAPTER 1. INTRODUCTION
nonstationarity, so we present it for the sake of completeness. Appendix B gives an account of
the EM algorithm, which is the main tool used to t the M4 model.
16
Chapter 2
Background
In this chapter, we rst summarize the theory of the extremes of univariate stationary random
sequences. We begin with the distributional characteristics of sample maxima obtained using
functional analysis methods, rst for independent, then for dependent sequences. The extremal
index is introduced as the linking parameter between the distributions of maxima from stationary
and independent sequences with the same marginal distribution. Point process methods are used
to gain further insight into role of the extremal index. Based on the results presented so far, we
review the multiple role of the extremal index in the clustering process of extremes, and give an
account of the existing methods for the estimation of the extremal index.
Section 2.2 presents the background for multivariate sample extremes. A very brief presenta-
tion of the distribution of the componentwise maxima from independent multivariate sequences
is given, followed by a summary for the stationary case and the introduction of the multivariate
extremal index. Finally the M4 model is presented.
Section 2.3 closes the chapter with an account of the wide literature omitted from the previous
sections.
17
CHAPTER 2. BACKGROUND
2.1 Point process of exceedances
2.1.1 The extremal types theorem
Consider n independent, identically distributed univariate variables X
i
with common distribu-
tion F, and denote the maximum by M
n
= max{X
1
, . . . , X
n
}. Take a sequence of real numbers
u
n
, and let S
n
=
n
i=1
I(X
i
> u
n
). As the X
i
are independent, S
n
is a binomial variable with
parameters n and

F(u
n
) = 1 F(u
n
), with expected value n
F(u
n
). If we are able to choose
u
n
so that limn
F(u
n
) = with constant as n tends to innity, then we can apply the Pois-
son approximation to a binomial variable, nding that for large n, S
n
follows approximately a
Poisson distribution with parameter . This is formulated in the following theorem.
Theorem 2.1. Let 0 , and suppose that u
n
is a sequence of thresholds such that
n
F(u
n
) as n . (2.1)
Then Pr {M
n
u
n
} e
as n for [0, ].
If a sequence u
n
can be found satisfying the limit (2.1) for a nonzero , such a sequence can
be found for any other > 0 too. Unfortunately, the existence of such sequences is not assured;
the necessary and sucient condition for this is lim
xx
R
{1 F(x)}/{F(x) F(x)} = 1, where
x
R
= sup{x : F(x) < 1}, and F(x) = lim
yx
F(y). Distributions having a jump at their
right endpoint never admit such sequences, and discrete variables must satisfy certain conditions
on the size of the jumps in their right tail. The Poisson and the geometric variables are the
best-known discrete examples for which (2.1) does not hold.
A limiting class of distributions can be identied under a linear normalization, that is,
the class of distributions which satisfy Pr {(M
n
b
n
)/a
n
x} G(x) as n . This
distribution should reect the fact that the maximum of the maxima of very long blocks of
the sequence X
i
is equal to the maximum of the whole sequence. Therefore, when the length
of the blocks tends to innity, it should follow the same distribution as the block maxima,
apart from the normalizing constants, which depend on n. This loosely put observation is
formulated by introducing the notion of max-stable distributions, and stating that a distribution
of the maximum of an independent, identically distributed random series must be max-stable.
Moreover, all possible forms of max-stable distributions, and therefore all possible forms of
limiting distributions for maxima, can be identied.
18
2.1. POINT PROCESS OF EXCEEDANCES
Denition 2.2. A distribution function G is max-stable if for any k N, there exist constants
k
> 0 and
k
such that
G
k
(
k
x +
k
) = G(x). (2.2)
Theorem 2.3 (de Haan (1970)). The set of possible nondegenerate limit distributions under a
linear normalization coincides with the class of max-stable distributions.
Theorem 2.4 (Extremal types theorem; Fisher and Tippett (1928); Gnedenko (1943)). If there
exist sequences of constants a
n
> 0 and b
n
such that
Pr
_
M
n
b
n
a
n
x
_
G(x)
for a nondegenerate function G(x), then G(x) must be one of the following three types, up to
ane transformations:
Type I (Gumbel):
(x) = exp(e
x
), x R;
Type II (Frechet):
(x) =
_
_
_
exp(x
) if x 0,
0 if x < 0,
for some > 0;
Type III (Weibull):
(x) =
_
_
_
exp((x)
) if x 0,
1 if x > 0,
for some > 0.
The commonly used form of the extreme-value distributions is
G(x) = exp
_
_
1 +
x
_
1/
_
, (2.3)
dened for {x : 1 + (x )/ > 0}, with parameters , R and > 0. In this notation,
=
1
when G is of type II and =
1
when it is of type III. The case = 0 is interpreted
as the limit 0 in equation (2.3). This form is more convenient for estimation: the unied
expression for the three types allows for the estimation of , without the need for a preliminary
choice of type to t. The three dierently behaving subclasses can be distinguished by the sign
19
of . The case = 0 covers for example the maxima of normal or exponential variables, with
exponentially decaying tail. For > 0, the distribution is dened on x > /. This class
models maxima of heavy-tailed variables such as the Pareto or the t distributions. For < 0, the
distribution is dened on x < /, so that there is an upper limit for the possible maximum
values; this case is appropriate for maxima of bounded random variables like the uniform.
The existence of a linear normalization leading to a nondegenerate limiting distribution
is not assured in general. Obviously, no such sequences can be found for the Poisson or the
geometric distributions, but the restriction to linear transformations excludes some continuous
distributions too. Thus, for a given distribution function, the rst question is whether it admits
such threshold sequences. If yes, the next question is to determine to which extreme-value class
the distribution of the maximum belongs. Necessary and sucient conditions based on the tail
behaviour of F are summarized for example in Leadbetter et al. (1983) or Resnick (1987). A
set of sucient conditions, characterizing the domain of attraction if F has a density, is given
by von Mises (1954), and a general approach based on the concept of regular variation can be
found in de Haan (1970).
Models closer to reality can be obtained by relaxing the assumption of independence, and
replacing it by strict stationarity. We require that the joint distribution of X
j
1
, . . . , X
j
k
and
X
j
1
+m
, . . . , X
j
k
+m
be the same for all choices of k, j
1
, . . . , j
k
and m, and we seek the possible
limiting distributions of the maxima of such sequences.
Without restricting the strength of the dependence, this limiting distribution can be any-
thing. An obvious example is a sequence, where X
1
follows an arbitrary probability law F,
and X
i
= X
1
almost surely for i > 1. A less articial example is a normal sequence with
suciently slowly-decaying correlation. Mittal and Ylvisaker (1975) demonstrated for exam-
ple normal behaviour for sample maxima when the correlations r
k
satisfy lim
k
r
k
= 0 but
lim
k
r
k
log k = . Long-memory processes are in general examples of such random se-
quences. An asymptotic condition, requiring that the exceedances above a threshold of a sta-
tionary sequence become nearly independent as the threshold increases, must be imposed. For
a sequence of thresholds u
n
, consider the events A =
I
{X
i
u
n
} and B =
J
{X
i
u
n
},
where I {1, . . . , k} and J {k +l, . . . , n}. Dene
n,l
= sup
k,I,J
|Pr{A B} Pr{A} Pr{B}| .
Condition D(u
n
) (Leadbetter, 1974; Leadbetter et al., 1983) is said to be satised if
n,ln
0
as n for some sequence l
n
= o(n).
20
This condition covers a very broad range of processes that are used in statistical practice.
D(u
n
) is sucient for obtaining the same class of limiting distributions under linear normaliza-
tion with the same constants as in the independent case, though Theorem 2.1 does not carry
over without change (Leadbetter, 1974).
Theorem 2.5. Let M
n
= max{X
1
, . . . , X
n
}. Suppose that there exists a sequence u
n
() such
that
n
F(u
n
()) as n , (2.4)
and that D(u
n
(
0
)) holds for some
0
> 0. Then if Pr{M
n
u
n
()} converges for some
0 <
0
, the limit will be
Pr{M
n
u
n
()} e
(2.5)
with a unique .
This theorem gives rise to the concept of the extremal index.
Denition 2.6. We shall say that the stationary sequence X
i
has extremal index if Pr{M
n

u
n
()} e
for all > 0 and u

n
() satisfying (2.4).
In general, the extremal index does not necessarily exist for a strictly stationary process even
under asymptotic independence; a counterexample is given in Embrechts et al. (1997). All that
can be said is that if condition D(u
n
()) holds, there exist 0
1 such that
e
liminf
n
Pr {M
n
u
n
()} limsup
n
Pr {M
n
u
n
()} e
.
Theorem 2.7. Suppose that M
n
has a nondegenerate limiting distribution G. Suppose also that
D(u
n
) holds for all sequences u
n
= x/a
n
+ b
n
( < x < ). Then G is one of the classical
extremal types given in Theorem 2.4.
Estimation of the distribution of the maximum in stationary or independent sequences can
be therefore based on the same models, that is, the extremal types and other, related models.
Nevertheless, Theorem 2.5 hints at dierences brought by the passage from independence to
strict stationarity. The interpretation of Theorem 2.1 does not hold anymore: dropping inde-
pendence means the failure of the binomial model and its Poisson approximation. The expected
number of observations above a given threshold u
n
is the same, according to Theorems (2.1)
21
and (2.4), but the probability that the largest observation is smaller than u
n
is dierent. This
implies that the limiting distribution of the maximum, though it belongs to the same class as if
the sequence were independent, must be a dierent representative of the class. An elegant way
to discuss the consequences in more detail is provided by point process theory.
2.1.2 Point processes
Let X
1
, . . . , X
n
be a sequence of independent, identically distributed random vectors on the
state space R
d
, endowed with the Borel -algebra B. The bar means the closure of R
d
. Let
moreover
x
() denote the Dirac measure concentrated at x R
d
, dened by
x
(B) =
_
_
_
1 if x B,
0 if x B,
for any B B. The measure dened as N() =
n
i=1
X
i
() is a random measure on B, with
the value N(B) =
n
i=1
X
i
(B) taken on a set B B; this measure counts the number of
points of the random process X
i
falling into a subset B of the state space R
d
. If its value is
nite with probability 1 on all compact sets of the Borel -algebra, it is called a point process.
The concept of point processes can be extended to include multiple points, describing processes
where X
i
1
= . . . = X
im
for sets of indices {i
1
, . . . , i
m
}. In this case, the point process N can be
given as N() =
i=1
m
i
i
(), where X
i
, i = 1, . . . , n
are the n
distinct values taken by X

i
,
and m
i
is the multiplicity of the value X
i
. A point process is called simple if it has the form
N =
i=1
Y
i
, that is, m
i
1 almost surely.
A realization of a point process N, corresponding to a realization x
1
, . . . , x
n
of the ran-
dom variables X
1
, . . . , X
n
, is the point measure N() =
n
i=1
x
i
(). The probability distri-
bution of the point process is given if all the nite-dimensional distributions of the vectors
{N(B
1
), N(B
2
), . . . , N(B
k
)} are given for any collection of Borel sets B
1
, . . . , B
k
B and any
k 1. In other words, specifying the probability distribution of a point process is equivalent to
specifying the joint probabilities (Pr{N(B
1
) = n
1
, Pr{N(B
2
)} = n
2
, . . . , Pr{N(B
k
) = n
k
}) for
all possible nite collections of sets from the Borel -algebra.
The most useful point process to discuss extremes of univariate random sequences X
1
, . . . , X
n
is the Poisson process. It serves as a limiting process when n , either modelling the times of
arrivals of the extremes, when it is one-dimensional (Leadbetter et al., 1983; Hsing et al., 1988),
22
or modelling both the size of the extremes and their times, when it is two-dimensional (Hsing,
1987). A Poisson process can be dened as follows.
Denition 2.8. Let be a Radon measure (that is, a locally nite measure) on B. A point
process N on R
d
is a Poisson process with mean measure , if it satises the following two
conditions:
(i) for k 0,
Pr {N(B) = k} =
_
_
_
e
(B)
(B)
k
k!
if (B) < ,
0 if (B) = ,
that is, the number of points in any set B follows a Poisson distribution with mean (B);
(ii) for mutually disjoint sets B
1
, . . . , B
m
, the random variables N(B
1
), . . . , N(B
m
) are inde-
pendent.
An important special case of the Poisson process is when the mean measure is equal to
the Lebesgue measure on R
d
multiplied by a constant, ; this is called the homogeneous Poisson
process with intensity . One of its extensions, the compound Poisson process relevant in
extreme statistics, is the generalization of the Poisson process dened on the state space [0, )
by allowing it to have multiple points.
Denition 2.9. Let Y
i
be the points of a Poisson process on [0, ) with mean measure . Let
i
be a sequence of nonnegative integer-valued random variables with common distribution ,
independent of each other and independent of the Poisson process. The point process
N =
i=1
Y
i
is called a compound Poisson process with marks
i
and mark distribution .
To have a limiting point process model for the extremes of a stationary series when the
length of the series tends to innity, that is, when n , we need to dene convergence of
point processes. Weak convergence for point processes is equivalent to weak convergence of the
nite-dimensional distributions.
Denition 2.10. Let N, N
1
, N
2
, . . . be point processes on a state space S R
d
equipped with
the Borel -algebra. For any set B, let B denote the boundary of B. The sequence N
n
is
23
said to converge weakly to N if for all possible collections B
1
, . . . , B
m
of Borel sets such that
Pr{N(B
i
) = 0} = 1 for i = 1, . . . , m,
Pr {N
n
(B
1
), . . . , N
n
(B
m
)} Pr {N(B
1
), . . . , N(B
m
)} as n .
Weak convergence is denoted by N
n
d
N.
The fundamental theorem for convergence of simple point processes to a Poisson process can
be now given (Resnick, 1987, Proposition 3.21).
Theorem 2.11. For each n, let Y
n;i
, i = 1, . . . , n be an independent, identically distributed
random sequence in R
d
. Dene the point process
N
n
=
n
i=1
(i/n, Y
n;i
)
for each n. Let N be a Poisson process on the state space R
+
R
d
with intensity | | , where
| | denotes the Lebesgue measure. Then
N
n
d
N as n ,
if and only if
nPr {Y
n;i
} () as n . (2.6)
In order to be concise, many details about the requirements on the underlying space and the
mode of convergence are omitted; these can be found in Resnick (1987, Chapter 3).
2.1.3 The point process approach for exceedances
The main object in the investigation of the extremes of a univariate random sequence is the point
process of exceedances, where the exceedances are dened with respect to a high threshold. Con-
sider rst a sequence of independent, identically distributed random variables X
1
, . . . , X
n
with
common marginal distribution function F, and let M
n
= max{X
1
, . . . , X
n
}. The occurrences of
the exceedances above u
n
, that is, the points i/n for which X
i
> u
n
, form a point process on
R
+
:
n
=
n
i=1
i/n
I(X
i
> u
n
),
24
where I(.) is the indicator function. The following theorem for the Poisson convergence of this
point process is a simple consequence of Theorem 2.11: by time homogeneity, it can be proven
that condition (2.1) implies (2.6) on all subintervals of (0, 1], thus entailing convergence in a
point process sense (see also Leadbetter et al. (1983), Theorem 5.2.1).
Theorem 2.12. Let u
n
be a sequence of real numbers. For an independent, identically dis-
tributed sequence X
i
having a common distribution function F, suppose that condition (2.1)
holds:
n
F(u
n
) as n .
Then
n
d
as n ,
where is a homogeneous Poisson process on the positive real line with intensity .
As was stated in Section 2.1.1, we wish to nd nondegenerate limiting models under a
linear normalization of the variables, now in terms of a point process. Suppose that there exist
sequences of constants a
n
> 0 and b
n
such that
Pr
_
M
n
b
n
a
n
x
_
w
G(x) (2.7)
with G nondegenerate. Put u
n
= a
n
x +b
n
, and dene the point process
N
n
=
n
i=1
i/n,(X
i
bn)/an
.
A good choice of topology on R
d
makes condition (2.7) equivalent to condition (2.6), since
for large n, F
n
(u
n
) = F
n
(a
n
x + b
n
) nPr{(X
1
b
n
)/a
n
> x} log G(x). Then, the
application of Theorem 2.11 with Y
i
= (X
i
b
n
)/a
n
yields the Poisson limit of these point
processes as n . Let | | denote Lebesgue measure.
Theorem 2.13 (Resnick (1987), Corollary 4.19). Suppose that the random sequence X
i
corre-
sponds to the above assumptions, that is,
Pr
_
M
n
b
n
a
n
x
_
= Pr {M
n
u
n
} = F
n
(u
n
) G(x)
for some nondegenerate G with left and right endpoints x
L
and x
R
. By the extremal types
theorem, G can be taken of form (2.3). Set (x, ] = log G(x). Then
N
n
d
N as n ,
25
where N is a Poisson point process with mean measure | | () on [0, ) (x
L
, x
R
]. In
particular:
(i) If G = , set the state space of the process as E = (, ], (x, ] = e
x
and x R.
Then N
n
d
N as n , where N is a Poisson point process with mean measure
| | () on [0, ) (, ].
(ii) If G =
, suppose that F(0) = 0, set the state space to be E = (0, ], (x, ] = x
and x > 0. Then N

n
d
N as n , where N is a Poisson point process with mean
measure | | () on [0, ) (0, ].
(iii) If G =
, set the state space of the process to be E = (, 0], (x, ] = (x)
and
x < 0. Then N
n
d
N as n , where N is a Poisson point process with mean
measure | | () on [0, ) (, 0].
For the strictly stationary case, the point process limits will be dierent. We consider rst
the point process of exceedance times, using the same denition as in the independent case: for
a sequence of thresholds u
n
such that n
F(u
n
) ,
()
n
=
n
i=1
i/n
I(X
i
> u
n
).
Just as in the derivation of the distributional limit for stationary sequences, we will need a
condition restricting long-range dependence on extreme levels in the process. In order to obtain a
weak point process limit, which is based on the convergence of all nite-dimensional distributions,
the condition should be required now on a broader system, namely, the -algebra generated by
events of type X
i
u
n
. Let B
i,j
(u
n
) = ({X
s
u
n
} : i s j) denote the -algebra
generated by the events X
s
u
n
, i s j. For each n and l = 1, . . . , n 1, dene
n,l
= max
A,B
| Pr{A B} Pr{A} Pr{B}|, (2.8)
where A B
1,k
(u
n
) and B B
k+l,n
(u
n
). Then
Condition (u
n
) (Hsing et al., 1988) is said to be satised if
n,ln
0 as n for some
sequence l
n
= o(n).
We have seen that Theorem 2.5 suggests a dierent point process limit than the simple
Poisson process with intensity . If the limit were the simple Poisson process, the probability of
not observing any exceedances among X
1
, . . . , X
n
would be e
. The probability of having no

26
points in (0, 1], stated by Theorem 2.5, is larger than that: it is equal to e
with 0 1. So
if a point process limit exists, it has on average fewer points on bounded sets than an independent
process.
Intuitively, we may expect that the local dependence of X
i
implies a tendency to clustering.
Groups of exceedances appear, in which the exceedances are separated only by small distances
that tend to zero in the limit when we rescale the times with n. Such exceedances will merge
as n grows, which can cause the limiting point process to have less points on average than the
independent limit. To account for this, we must introduce an appropriate notion of clusters.
Partition the sequence X
1
, . . . , X
n
into s
n
blocks of length r
n
= n/s
n
, where x is the integer
part of x, such that s
n
, s
n
l
n
= o(n) and s
n
n,ln
0. These choices ensure that the size
of the partition increases as n grows, so that when we scale the times by n in order to obtain a
point process in (0, 1], the size of each block shrinks to 0 in (0, 1] as n . Also, a distance l
n
is needed to assure near-independence between the maxima of neighboring blocks, corresponding
to condition D(u
n
), by not considering l
n
observations from the end of each block. On the other
hand, l
n
should be such that omitting these s
n
l
n
observations should have vanishing inuence
on the distribution of the maxima as n . The choices s
n
l
n
/n 0 and s
n
n,ln
0 take
care of these requirements.
Denition 2.14. The cluster size distribution of the strictly stationary process X
1
, . . . , X
n
is
n
(j) = Pr
_
rn
i=1
I(X
i
> u
n
) = j
rn
i=1
I(X
i
> u
n
) 1
_
.
Theorem 2.15 (Hsing et al. (1988)). Let X
1
, . . . , X
n
be a strictly stationary random process
with margin F. Dene the sequence u
n
() such that n
F(u
n
()) as n . Suppose that
for each > 0, X
i
satises condition (u
n
()), and that the limiting cluster size distribution
lim
n
n
(j) = (j) exists. Then if
()
n
converges to a point process
()
, the limit is necessarily
a compound Poisson process
()
with intensity and marks following the distribution .
A complete point process limit, the counterpart of Theorem 2.13 for the point process
N
n
=
n
i=1
i/n,(X
j
bn)/an
,
given on the product space R
+
R
+
, is stated by Hsing (1987). This limit theorem needs
a stricter asymptotic dependence condition, requiring asymptotic independence simultaneously
on multiple levels. Let k and n be positive integers, and let B
i,j
(
1
, . . . ,
k
) denote the -
algebra generated by the events {X
r
u
n
(
m
), i r j, m = 1, . . . , k}. For any choice of
27
1
, . . . ,
k
> 0 and l = 1, . . . , n 1, dene
(n, l,
1
, . . . ,
k
) = max
A,B,s
| Pr{A B} Pr{A} Pr{B}|,
where A B
1,s
(
1
, . . . ,
k
), B B
s+l,n
1
, . . . ,
k
), and s = 1, . . . , n l. Then
Condition
(u
n
) is said to be satised if for each choice of k,
1
, . . . ,
k
,
(n, n,
1
, . . . ,
k
)
0 as n for all (0, 1).
The convergence of the point process N
n
can be stated under condition
(u
n
).
Theorem 2.16 (Hsing (1987)). Let X
1
, . . . , X
n
be a strictly stationary random process satisfying
the condition
(u
n
) for a sequence of functions u
n
(). Suppose that
Pr
_
M
n
b
n
a
n
x
_
G(x)
for some nondegenerate G with left and right endpoints x
L
and x
R
. Set (x, ] = log G(x),
and suppose that condition
(u
n
) holds for u
n
= a
n
x +b
n
. If
N
n
d
N as n
for some point process N, then N has the representation
N =
i=1
K
i
j=1
(S
i
,X
ij
)
,
where (S
i
, X
i1
) are the points of a nonhomogeneous Poisson point process with mean measure
| | () on [0, ) (x
L
, x
R
], and the X
ij
are such that the variables
Y
ij
=
log G(X
ij
)
log G(X
i1
)
, j = 1, . . . , K
i
,
are for all i the points of a point process
i
on [1, ) with 1 as an atom. The
i
are independent of
the nonhomogeneous Poisson process (S
i
, X
i1
) and of each other, and are identically distributed.
The limiting process is constituted of the points (S
i
, X
ij
). For each i, there are K
i
points
(S
i
, X
ij
), which occur at the same time S
i
(0, 1]. This group of points corresponds to a cluster
of extremes, with X
i1
being the largest value in the cluster. The other points (S
i
, X
ij
) at S
i
are determined by a mark process
i
that generates the normalized relative levels Y
ij
. Its atom
at 1 corresponds the cluster maximum X
i1
. The distribution of the process
i
depends on the
local dependence of the sequence X
i
, and can be indeed any distribution (Mori, 1977). For a
stationary series satisfying the asymptotic independence condition
(u
n
), the theorem states
that the distribution of the relative sizes of exceedances is the same for each cluster, and dierent
clusters are independent of each other.
28
2.1.4 Interpretation of the extremal index
The models described above give insight to the eect of serial dependence from several points of
view. A comparison of Theorems 2.7 and 2.5 yields its rst main consequence. For a stationary
random sequence X
i
with marginal distribution function F, dene its associated independent
sequence

X
i
, an independent identically distributed series with the same margin F. Denote the
maxima of X
1
, . . . , X
n
and

X
1
, . . . ,

X
n
by M
n
and

M
n
, respectively. The information about
the relationship between the distribution functions of the maxima of the dependent and the
independent sequences is comprised in the extremal index.
Theorem 2.17 (Leadbetter et al. (1983), Corollary 3.7.3). Suppose that the stationary sequence
X
i
has extremal index > 0. Then Pr{(M
n
b
n
)/a
n
x} G(x) if and only if Pr{(

M
n

b
n
)/a
n
x}

G(x)
.
Theorem 2.17 shows that although the admissible nondegenerate limiting distributions are
the same in the dependent and the associated independent series, the maximum of a dependent
sequence is stochastically smaller than that of its associated independent sequence, and the
decrease is quantied by the extremal index.
The other main consequence is the clustering of exceedances created by the serial dependence
in the process, which is a new aspect compared to the independent case. Divide the sequence X
i
into s
n
blocks of size r
n
such that s
n
, s
n
n,ln
0 and s
n
l
n
= o(n); then r
n
= n/s
n
. This
corresponds to the partition used in Condition (u
n
). Let Z denote the number of exceedances
in one block: Z =
rn
i=1
I(X
i
> u
n
). Then, under some additional conditions that allow expected
values and limits to be exchanged (Leadbetter, 1983),
E(Z | Z 1)
1
.
The extremal index is thus equal to the inverse of the mean cluster size. This agrees with
Theorem 2.15: when the expected number of exceedances is , say, we expect only locations
where clusters occur. The extremal index summarizes the information about the mean size of
the clusters.
Theorem 2.16 shows also that the precise structure of the clusters is not determined by the
conditions imposed on the process X
i
. A few dierent structures from simple processes, which
will serve later as simulation examples, are shown in Figure 2.1. Leadbetter (1983), Leadbetter
29
0 10 20 30 40 50
0
1
0
2
0
3
0
4
0
AR(1)
y
0 10 20 30 40 50
3
0
1
0
1
0
3
0
AR(2)
0 10 20 30 40 50
1
.
0
0
.
0
1
.
0
2
.
0
Markov chain
Figure 2.1: Clusters of dierent processes. The samples are from the examples used and presented in
Section 3.1.
and Nandagopalan (1989) and Chernick et al. (1991) introduced a hierarchy of further conditions
D
(K)
(u
n
) to summarize cluster structure in clear terms.
Condition D
(K)
(u
n
) is said to be satised if there exist sequences of integers s
n
and l
n
such
that condition D(u
n
) holds with l
n
, s
n
, s
n
n,ln
0 and s
n
l
n
= o(n) for which
nP{X
1
> u
n
, M
2, K+1
u
n
, M
K+2, rn
> u
n
} 0 as n ,
where M
i,j
= max{X
t
: i t j}.
We take a block of size r
n
, and partition it into three components. The rst is the beginning
observation, the second is a following block of length K, and the third is the remaining part of
the block (the denition given here diers from the original introduced in Chernick et al. (1991)
in the value of K). For K 1, the condition requires that the expected number of events where
an initial exceedance is followed by K non-exceedances and at least one exceedance amongst
the following r
n
(K + 1) observations, should tend to zero. When this does not hold, the
probability of such events does not decrease to zero. Thus, they can be found for all thresholds
u
n
and sample sizes n. We might say that this is a characteristic cluster pattern, since it will
occur at every threshold. In this sense, the value of K can be considered as the extreme-level
dependence range of the process; for instance, D
(1)
(u
n
) holding is equivalent to saying that all
clusters in the limit are formed by contiguous sequences of exceedances. This is the case for the
AR(1) example of Figure 2.1; for the AR(2), D
(1)
(u
n
) clearly does not hold, but it turns out
that D
(6)
(u
n
) does. For K = 0, we may extend the condition by introducing the convention
M
i,j
= for i > j; condition D
(0)
(u
n
) is thus equivalent to nP{X
1
> u
n
, M
2, rn
> u
n
} 0
30
as n , requiring that in the block of r
n
1 observations after an exceedance there is no other
exceedance, and in the limit, exceedances occur individually. The extremal index is related to
the conditional distribution of the maximum in the block of length K after an exceedance.
Theorem 2.18 (Chernick et al. (1991)). Suppose that for some K, the conditions D(u
n
) and
D
(K)
(u
n
) hold for u
n
() for all > 0. Then the extremal index of X
i
exists and is equal to if
and only if
Pr{M
2,K+1
u
n
() | X
1
> u
n
()} for all > 0.
This is equivalent to Pr{M
2,rn
u
n
() | X
1
> u
n
()} under D
(K)
(u
n
), since if the
condition holds, the probability of an exceedance occurring in the last part X
K+2
, . . . , X
rn
tends
to zero. The rst who gave the interpretation of the extremal index as a conditional probability
was OBrien (1987).
These implications give rise to a number of estimating methods for the extremal index.
2.1.5 Methods for the estimation of the extremal index
The blocks method
The blocks estimator is based on the interpretation of the extremal index as the inverse of the
mean cluster size (Leadbetter, 1983). Choose a threshold u, select a partition of the sample into
appropriate blocks of size r, with k blocks in the partition, and denote the maximum of block
i by M
(i)
r
. Dene the empirical counterpart of E(Z | Z 1)
1
as an estimator for , where
Z =
r
i=1
I(X
i
> u):
(B)
n
=
k
i=1
I(M
(i)
r
> u)
kr
i=1
I(X
i
> u)
. (2.9)
The numerator of (2.9) counts the blocks having at least one exceedance, whereas the de-
nominator counts all the exceedances. It is based on the notion that blocks of length r
n
= o(n)
shrink to a point in the limit as n when rescaled by n, and thus all exceedances of the
increasing threshold u
n
in the block merge into one cluster.
The blocks estimator is consistent under some conditions which are detailed in Weissman and
Novak (1998):

(B)
n
P
, where
P
denotes convergence in probability; it is asymptotically
normal if < 1 (see also Hsing (1991)).
31
The main problem of the estimation by the blocks method consists of the choices of the
parameters, the threshold u and especially the block size r. For threshold choice, there are
theoretically well-founded methods in the literature, such as the mean excess plot or the stability
of the parameters of a generalized Pareto t. An additional method is proposed in this thesis,
based on misspecication tests for the validity of the limiting point process model. For block
size selection, at present there are no such methods with background rooted in statistical theory,
though the misspecication tests, when applied to the generalized extreme-value modeling of
block maxima, may provide help on this. Simulations and real data applications show that the
estimates are highly sensitive to the choice of the block size.
The runs method
The runs method is based on the interpretation of the extremal index as the limit of the con-
ditional probability Pr{M
2,rn
u
n
| X
1
> u
n
} as n (OBrien, 1974, 1987). For an
appropriately chosen threshold u and run parameter r, dene the sample analog of this condi-
tional probability by
(R)
n
=
n
i=1
I(X
i
> u, M
i+1,i+r1
u)
n
i=1
I(X
i
> u)
. (2.10)
This is called the runs estimator. The numerator counts the number of exceedances in the
process which are followed by at least r non-exceedances. Those exceedances followed by another
exceedance within a period of length r are discarded: they are considered as belonging to the
same cluster as the following exceedance. Thus, a cluster begins with an exceedance after at
least r non-exceedances, continues while there is always a new exceedance within a distance
r 1 from the preceding one, and terminates with the exceedance that is followed by at least r
consecutive non-exceedances. These structures model the clusters of the limiting process. The
runs estimator, like the blocks method, corresponds to an empirical version of the inverse mean
cluster size, but it identies clusters using a dierent principle.
The runs estimator is consistent under certain assumptions, and if < 1, it is asymptotically
normal under some additional conditions (Hsing, 1993; Weissman and Novak, 1998). In terms
of its second-order asymptotic bias, the runs estimator is preferable to the blocks estimator.
Estimation of the extremal index by the runs estimator requires a choice of threshold u
and run parameter r. For the threshold, the same procedures can be applied as for extreme-
value analysis in general. The selection of the run parameter is not guided by any theoretical
32
consideration, and is largely arbitrary in practice. Since the identied clusters in a given series
are sensitive to the value of r, and dierent r values can imply dierent counts of clusters, the
estimate

(R)
n
can vary greatly with r. The misspecication tests proposed in the thesis can
provide guidelines for the selection of the run parameter, since the run parameter is the minimal
separation between groups of extremes that allows us to assume them independent. This will
be discussed in Section 3.1. The characteristics of the runs and the blocks estimators are also
discussed in Smith and Weissman (1994).
The intervals method
The extremal index is also related to the distribution of the times between exceedances. In
the compound Poisson limit on (0, 1], these periods must either be zero, corresponding to times
between extremes within the same cluster, or random exponential variables with parameter ,
with = lim
n
n
F(u
n
) being the limit of the expected number of exceedances using the
sequence u
n
. A normalization by

F(u
n
) maps the point process of exceedances onto the interval
(0, ] instead of (0, 1], which was achieved by normalizing the times with n
1
, and used in
the basic theorems on point processes of exceedances. Using the normalization by

F(u
n
), the
parameter of the exponential distribution becomes and therefore relatively easily estimable,
and Ferro and Segers (2003) formulated a moment estimator for the extremal index based on
this distribution.
Dene the inter-exceedance time in the stationary sequence X
i
by
T(u
n
) = min{n 1 : X
n+1
> u
n
| X
1
> u
n
}.
Then
Pr {T(u
n
) > t} = Pr {M
2,t+1
u
n
|X
1
> u
n
} .
Introduce a new asymptotic independence condition.
Condition
(u
n
) For any A F
1,k
(u
n
) with P(A) = 0, B F
k+l,crn
(u
n
) and k = 1, . . . , cr
n
l, we have
|P(B | A) P(B)|
(cr
n
, l),
and there exists a sequence l
n
= o(n) for which
(cr
n
, l
n
) 0 as n for all c > 0.
Ferro and Segers (2003) proved the following theorem (see also Segers (2002) who dealt with
the problem in dierent terms):
33
Theorem 2.19. If there exist sequences of integers {r
n
} and of thresholds {u
n
} such that
(i) r
n
, r
n
F(u
n
) and P {M
rn
u
n
} e
as n for some (0, ) and

(0, 1], and
(ii) condition
(u
n
) is satised,
then as n ,
P
_
F(u
n
)T(u
n
) > t
_
exp(t) for all t > 0. (2.11)
The intervals estimator of the extremal index is based on the bias-corrected coecient of
variation of the variable T(u) for a choice of u.
(I)
n
=
_
_
min
_
1,
2(
N1
i=1
T
i)
2
(N1)
N1
i=1
T
2
i
_
if max{T
i
: 1 i N 1} 2,
min
_
1,
2[
N1
i=1
(T
i
1)]
2
(N1)
N1
i=1
(T
i
1)(T
i
2)
_
if max{T
i
: 1 i N 1} > 2.
(2.12)
The intervals estimator is consistent (Segers, 2002; Ferro and Segers, 2003). As asymptotic
normality is not proved, the assessment of uncertainty is performed by a bootstrap procedure.
The rst step is to take the largest C =
(I)
n
(N 1) inter-exceedance times as inter-cluster
times, and divide the rest into C series of within-cluster times, leaving together the consecutive
sequences which describe a complete observed cluster from the preceding inter-cluster time until
the next. Then, resample C inter-cluster times with replacement from the set of inter-cluster
times and C series of within-cluster times from the rest. Intercalating them to obtain a replicate
of the process and repeating the procedure R times, a simulated distribution of any cluster
functional can be obtained, and the empirical - and (1 )-quantiles can be used to obtain
(1 2)-condence intervals.
Compared to the runs and the blocks estimators, the intervals estimator does not need the
choice of an auxiliary parameter except for a threshold, and it avoids the need for preliminary
cluster identication. Clusters can be identied post-estimation, based on the estimated . The
problem of the sensitivity to threshold choice also persists for the intervals estimator, however.
The two-threshold method
This method, like the blocks and the runs estimator, is based on the interpretation of the ex-
tremal index as the inverse of the mean cluster size, but with a rened cluster denition (Laurini
34
and Tawn, 2003). Very dierently from the paradigm of extreme-value statistics, it incorporates
information on the process at lower levels, from two additional sources: the trajectory of the
process around exceedances and the cause of dependence. The rst source is included by den-
ing a lower threshold c: a cluster is terminated either if there is a run of non-exceedances of a
pre-specied length or there is an X
i
c within this distance. The second source is the catego-
rization of the process as a positively associated autoregressive-type process or a volatility-driven
process. According to this distinction, two denitions are given for the two-threshold estimator,
the one referring to the volatility-driven processes needing a full modelling of the process.
Autoregressive-type dependence
Let x
L
= inf{x : F(x) > 0} and x
R
= sup{x : F(x) < 1} denote the upper and lower
endpoints of the distribution of the stationary sequence X
i
, and consider two thresholds
x
L
c u < x
R
. For i < j, dene
L
X
i,j
= min{X
i
, . . . , X
j
}, M
X
i,j
= max{X
i
, . . . , X
j
}.
Dene also the event
T
i,c,u
=
_
L
X
2,i1
> c, M
X
2,i1
< u, X
i
c
_
,
which corresponds to X
2
, . . . , X
i1
remaining between c and u, and X
i
dropping below c.
Conditioned on X
1
> u, such an event ends a cluster of exceedances. The two-threshold
approximation of the extremal index is
X
(u, m, c) = Pr
_
_
M
X
2,m
u
_
_
m1
_
i=2
T
i,c,u
_
X
1
> u
_
.
After decomposing the event in the argument into disjoint events, and taking the empirical
counterpart of the probabilities, we can obtain the two-threshold estimator for processes
with autoregressive-type dependence:
X
(u, m, c) =
n(m1)
i=1
W
i
_
m1
j=1
B
i+j
+
m1
k=1
S
i+k
k1
r=1
B
i+r
_
n
i=1
W
i
(2.13)
with the notation W
i
= I(X
i
> u), B
i
= I(c < X
i
u) and S
i
= (X
i
c).
Volatility-driven processes
Suppose that X
i
=
i
Z
i
, where Z
i
is a zero mean independent and identically distributed
sequence,
2
i
is the conditional variance which depends on past values of X
i
and/or
i
,
35
and Z
i
and
i
are independent. In this processes, it can be useful to associate clusters of
extreme values to high volatility periods. For i < j, dene
L
i,j
= min{
i
, . . . ,
j
}, M
X
i,j
= max{X
i
, . . . , X
j
}.
For the thresholds u [x
L
, x
R
] and c (0,
R
] with
R
dened similarly to x
R
, dene the
event
V
i,c,u
=
_
L
2,i1
> c, M
X
2,i1
u,
i
c
_
,
which corresponds to X
2
, . . . , X
i1
being below u,
2
, . . . ,
i1
remaining above c, and
i
dropping below c. Conditioned on X
1
> u, such an event ends a cluster of exceedances.
The two-threshold approximation of the extremal index in this case is given by
(u, m, c) = Pr
_
_
M
X
2,m
u
_
_
m1
_
i=2
V
i,c,u
_
X
1
> u
_
.
To implement this as an estimator, we also need an estimate of the unobserved volatility
process, so the rst step is to t a stochastic volatility-type model to X
i
and to obtain
i
.
The two-threshold estimator for volatility-driven processes is then
(u, m, c) =
n(m1)
i=1
W
i
_
m1
j=1
B
i+j
+
m1
k=1
S
i+k
k1
r=1
B
i+r
_
n
i=1
W
i
(2.14)
where the meaning of the notation W
i
= I(X
i
> u), B
i
= I(X
i
u,
i
> c) and S
i
=
(X
i
u, c) is changed from (2.13).
Consistency or asymptotic normality was not formally proven for the two-threshold esti-
mator. Although an additional auxiliary parameter was introduced into the estimation, the
sensitivity to the selection of the upper and lower threshold and the run parameter seems to de-
crease, and the estimator shows remarkably stable behaviour with respect to these parameters.
A major drawback is that for volatility-driven processes, modelling of the process is necessary.
Applicability of the methods in nonstandard situations
How do these estimators behave when faced with the real-life climatological situation? The rst
step, when we want to infer the characteristics of the extremes of a climatic time series, is always
to choose a threshold. For this, well-founded methods are given in the literature.
Stability of the GPD parameters The generalized Pareto model is a conditional model for the
distribution of the excesses of a high threshold. If the normalized maximum M
n
of a
36
stationary sequence X
1
, . . . , X
n
has a limiting generalized extreme-value distribution with
parameters , and , then the excesses over a high threshold u, Y
i
= X
i
u for any i,
follow the generalized Pareto distribution
Pr(Y
i
> y|X
i
> u) =
_
1 +
y
_
1/
,
dened on {y : y > 0, 1 + y/ > 0}, where = + (u ). It can be proven that
if the generalized Pareto distribution is valid for a threshold u
0
with parameters
u
0
and
u
0
, then it must be valid also for all thresholds u > u
0
, with parameters
u
=
u
0
and
u
=
u
0
+
u
(u u
0
). That is, above the lowest valid threshold, the parameters
u
and
u
u should be constant. This can be exploited for threshold choice. For a given data
set, we estimate the GPD parameters and their standard errors using a series of thresholds,
and choose the lowest above which stability of the estimates is acceptable.
Mean excess plot If the variable Y = X u follows a generalized Pareto distribution, it has
expected value E(Y |X > u) =
u
/(1 ) if < 1. Since above a valid threshold u
0
,
the generalized Pareto distribution remain valid, and in this case
u
depends linearly on
u > u
0
, the expected value of the excesses of increasing thresholds is also linear in u. For a
range of thresholds u
j
, the mean excess
n
i=1
(X
i
u
j
)I(X
i
> u
j
)/ {
n
i=1
I(X
i
> u
j
)} in
a sample X
i
can be calculated, and plotted against the thresholds u
j
. The smallest value
u above which the graph is approximately linear can be selected for the analysis.
Nevertheless, the situation is not always clear. Threshold dependence is checked only through the
generalized Pareto model, since both the mean excess plot and the parameter stability method
stem from it. It seems that the stability of GPD modelling does not always ensure the stability
of the modelling of the clusters: this thesis will present a case study in Section 3.2, where the
extremal index estimates show threshold dependence even at levels where GPD modelling obtains
stable parameter estimates. All the methods listed so far for the estimation of the extremal index
are sensitive to threshold choice (Laurini and Tawn, 2003), and there is no warranty that we
obtain stable estimates for , even when the GPD-based diagnostics show no problems.
Additional issues emerge for the blocks, runs and two-threshold methods: the choice of
other auxiliary parameters, such as block size, run parameter or lower threshold. At present,
there are no rules constructed from theory to select these, only guidelines distilled from the
statistical experience of the analyst, or from the known particularities of the data set. There are
no ways, either, to estimate the bias entailed by a wrong selection of the auxiliary parameters,
37
and no formal account of the additional uncertainty of the estimate arising from the selection
uncertainty.
These inconveniences are present even for a data set which is otherwise well-behaved: that is,
it is stationary, and satises the appropriate variant of the asymptotic independence conditions
D(u
n
), (u
n
),
(u
n
) or
(u
n
). But these conditions are often not met in reality. In fact,
nonstationarity is inherent to the recent issues of interest in climatology, given that one of the
main goals is to assess the eects of climate change. The nonstationarity undermines the foun-
dations of these methods, since one of their fundamental assumptions does not hold. Statistical
analyses of nonstationary processes can however be performed in various ways, two of which
are the inclusion of time as a covariate into a parametric model, and the use of semiparametric
methods such as smoothing. None of these can be applied together with the listed methods, and
therefore the use of these in climate studies is quite limited.
The independence conditions can also be broken. Long-range dependence is a well-known
characteristic of many climatic time series (Gil-Alana, 2008). Such behaviour can easily stem
from the complex partial dierential equation system that governs the climate on long timescales.
For the extremes of long memory processes, there are currently no general underlying theory, only
special models, and certainly no methods to estimate their extremal clustering characteristics.
Separation of the observed series into a time series model, possibly with long-range dependence,
and the residuals may help to solve this problem. If the residuals are suciently close to satisfy
the asymptotic independence conditions of extreme-value statistics, then it is safer to apply the
methods for extremes to these residuals instead of the original time series.
It seems thus that possessing methods to recognize these problems is necessary. Both non-
stationarity and long-range dependence can have serious implications for model validity and the
quality of the resulting estimate, and so does the selection of the auxiliary parameters. More-
over, in addition to such diagnostics, it would be wise to construct methods that are adaptable
to nonstationarity. The purpose is often the detection of remaining time dependence after the
removal of time dependence from known sources, so methods that are built on the assumption
of underlying stationarity may be useless. In the thesis, we consider the likelihood methods as
an alternative to the existing estimators of the extremal index. The likelihood oers also a way
to construct diagnostic methods to discover model validity problems. All the issues presented
above result in an inference based on an invalid model, that is, on a misspecied model, and this
problem has a large literature in econometrics. The methods proposed there are general, adapt-
38
2.2. MULTIVARIATE EXTREME VALUES
able to the mixed exponential-point mass models, and can give help in the auxiliary parameter
selections and in the detection of nonstationarity or long memory character. The uncertainty
linked to the misspecication also can be quantied.
2.2 Multivariate extreme values
2.2.1 Multivariate extremes in the independent case
When investigating the extremal characteristics of multivariate time series, important new as-
pects come into play. Extreme-value modeling must deal with the dependence between the
components of the observed vector. Suppose that X
1
, . . . , X
n
is a random sequence of D-
dimensional vectors X
i
= (X
i1
, . . . , X
iD
), with common marginal distribution function F(x)
and its componentwise margins F
d
(x
d
). The joint distribution function F(x) can be indepen-
dent with F(x) =
D
d=1
F
d
(x
d
), or have a general form. To this basic dependence, a dependence
between observations X
i
over time can be superposed. This change, as in the univariate case, im-
plies certain relations between serially dependent and associated serially independent sequences
in the extreme-value limit, and involves a function : R
D
[0, 1] with a similar role as the
univariate extremal index.
As in the univariate case, we begin with the limit theorems for vectors X
i
independent over
time. In order to do so, we rst introduce some conventions and denitions. All operations and
order relations are done componentwise; examples are
ax +b = (a
1
x
1
+b
1
, . . . , a
d
x
d
+b
d
),
max{x, y} = (max{x
1
, y
1
}, . . . , max{x
D
, y
D
})
x y if and only if x
d
y
d
for all d = 1, . . . , D.
According to these conventions, we dene the sample maximum for the multivariate case as
M
n
= max{X
i
: i = 1, . . . , n}, of which the components are M
nd
= max{X
id
: i = 1, . . . , n}.
The vector M
n
thus is not necessarily a sample vector. If the sequence X
i
is independent, then
the distribution of the componentwise maximum is Pr{M
n
x} = F
n
(x) for x R
D
, and just
as in the univariate case, we look for sequences a
n
> 0 and b
n
that lead to a nontrivial limit
distribution, that is,
F
n
(a
n
x
n
+b
n
)
d
G(x) as n ,
where G(x) is a multivariate distribution function. Since a sequence of random vectors can
39
converge in the weak sense only if all the marginal sequences converge, the margins G
d
(x
d
) must
be univariate extreme-value distributions (Beirlant et al., 2004).
The arguments for the possible forms of G(x) can be based on the reasoning used in the
univariate case: if a distribution function can occur as a nondegenerate limiting function for the
normalized maxima, then it must be max-stable, that is,
G
k
(
k
x +
k
) = G(x). (2.15)
The identication of the possible forms of this class does not lead to a simple nite-parameter
family in the multivariate case. The only restriction on G implied by the max-stable property
is that G
1/k
(x) must be a distribution function for every k Z. This property is called innite
divisibility, and it implies that G(x) can be written in the form
G(x) =
_
_
_
exp {([, x]
c
)} , for all x [, )
0, otherwise,
where the superscript c denotes complement, and is a measure on [, ) (Balkema and
Resnick, 1977), which is called the exponent measure, and is in general not unique. This freedom
can be used to require the exponent measure to be concentrated on [q, ) \ {q} for any q. We
will do so in the sequel using the particular value q = 0, and denote the space [0, ) \ {0} by
E. With the notation V (x) = ([0, x]
c
), we obtain the form
G(x) = exp{V (x)}.
The choice of the margins is also largely arbitrary, since we can transform any continuous
marginal distribution to a chosen continuous standard form. Denoting the inverse function of
G
d
by G
d
, the transformation
G
(x) = G
_
G
1
(e
1/x
1
), . . . , G
D
(e
1/x
D
)
_
leads to a distribution G
with unit Frechet margins.

From now on, let G be a multivariate extreme-value distribution with unit Frechet margins,
and V (x) the function dened by its exponent measure. The max-stability property (2.15),
together with the appropriate choice of normalizing constants
k
= k
1
and
k
= 0 for a
unit Frechet limit, and a continuity argument, imply that the exponent measure satises the
homogeneity property
(B) = t(tB)
40
on all Borel subsets of E, with the notation tB = {x : t
1
x B}.
Let denote an arbitrary norm on R
D
, and let S = {x E : x = 1} be the unit sphere
in E. For a Borel subset A S write S(A) = {x : x > 1, x
1
x A}, and introduce the
transformation T dened by T(x) = (x, x
1
x). This is a polar coordinate transformation,
which, by virtue of the homogeneity property, allows us to write as a product measure:
_
x E : x > r, x
1
x A
_
= r
1
S(A)
for r > 0 and A S Borel. Considering that the image of the set [0, x]
c
by the coordinate
transformation T is
T([0, x]
c
) =
_
(r, a) : r > min
d{1,...,D}
_
x
d
a
d
__
,
we can express the function V (x) as follows:
V (x) =
_
S
max
d{1,...,D}
_
a
d
x
d
_
S(da).
These arguments yield a fundamental result of multivariate extreme-value statistics (de Haan
and Resnick (1977), Resnick (1987)):
Theorem 2.20. G(x) is a multivariate extreme-value distribution with unit Frechet margins,
if and only if there exists a nite measure S on the unit sphere S satisfying
_
S
a
d
S(da) = 1, d = 1, . . . , D,
such that
G(x) = exp
_
_
S
max
d{1,...,D}
_
a
d
x
d
_
S(da)
_
.
Two special cases, providing extremal cases of the range of the possible behaviour, are com-
plete dependence and complete independence. Let e
d
denote the dth unit vector in R
D
, and 1
the vector with components (1, . . . , 1). Complete dependence corresponds to S being a Dirac
measure concentrated at the point 1/1, with the corresponding distribution function given
by G(x) = min
i{1,...,D}
G
d
(x
d
). The independent case is represented by the measure S concen-
trating on {e
d
, d = 1, . . . , D}, and the distribution function G factorizes: G(x) =
D
d=1
G
d
(x
d
).
Moreover, it can be shown that complete independence of the multivariate extreme-value func-
tion is equivalent to pairwise independence for all pairs of margins (Berman, 1961). Also,
multivariate extreme-value distributions are always positively associated: it can be proven that
cov[f(Y ), g(Y )] 0 for all pairs of nondecreasing functions, if the rst and second moments
exist (Marshall and Olkin, 1983).
41
A useful notion is the stable tail dependence function, which is dened as a reparametrization
of V (x).
Denition 2.21. The stable tail dependence function of G(x) = exp{V (x)} is
l() = V (
1
1
, . . . ,
1
D
),
with [0, ]. Equivalently, in terms of the original max-stable function, it can be given as
l() = log G{G
1
(e
1
), . . . , G
D
(e
D
)},
and directly with the exponent measure
l() =
_
S
max
d{1,...,D}
(a
d
d
) S(da).
The value

= l(1) for an independent process is termed the extremal coecient, with the
notation 1 = (1, . . . , 1).
There are a number of useful properties which express among others the max-stability prop-
erty and the requirements of the unit Frechet margins in terms of the stable tail dependence
function.
(L1) l(s) = sl() for 0 < s < ;
(L2) l(e
d
) = 1 for d = 1, . . . , D;
(L3) min
d{1,...,D}
d
l()

D
d=1
d
for all , with the lower bound corresponding to com-
plete dependence, and the upper to independence;
(L4) for [0, 1], l{v + (1 )w} l(v) + (1 )l(w), with v, w [0, ].
Numerous representations with either unit Frechet or other margins other than the one in
Theorem 2.20 can be given for a multivariate max-stable distribution function, and are discussed
in the literature. The fundamental theorems are summarized in Resnick (1987). A more recent
summary is Beirlant et al. (2004), which describes various representations and new developments
in detail, together with some practical outlook and many data applications. Other aspects,
including threshold models and an extension of the generalized Pareto model to the multivariate
case, are presented in Falk et al. (2004).
42
2.2.2 The multivariate extremal index
As in the univariate case, we wish to relax the assumption of temporal independence. Main-
taining sucient asymptotic independence is enough to obtain the same limiting class. A
multivariate counterpart of the condition D(u
n
) can ensure asymptotic independence at ex-
treme levels. For a sequence of thresholds u
n
, consider the events A =
iI
{X
i
u
n
} and
B =
iJ
{X
i
u
n
}, where I {1, . . . , k} and J {k +l, . . . , n}.
Condition D(u
n
) (H usler, 1990) is said to be satised if with the above choices we have
|Pr{A B} Pr{A} Pr{B}|
n,l
where
n,ln
0 as n for some sequence l
n
= o(n).
Under D(u
n
), the class of limit distributions is the same as in the temporally independent
case.
Theorem 2.22 (H usler (1990)). Let X
i
be a stationary sequence for which there exists sequences
of constants a
n
> 0 and b
n
, and a distribution function with nondegenerate margins such that
Pr
_
M
n
b
n
a
n
x
_
d
G(x) as n .
If D(u
n
) holds with u
n
= a
n
x + b
n
for each x such that G(x) > 0, then G is a multivariate
extreme-value distribution function.
Since the class of limiting distributions for maxima is the same for stationary processes as
for independent processes, we can use any representation of the max-stable processes. We can
ask what the relationship is between the limiting extreme-value distributions of a stationary
sequence and its associated independent sequence. Certainly, the link must be more complex
than in the univariate case. Transformations of the components X
id
of X
i
may yield the same
margins, but they do not change the temporal dependence structure of the marginal processes,
and the summary measure, the univariate extremal index,
d
remains the same in each sequence.
There is no reason to assume them equal, and that means that the multivariate extremal index
cannot be constant.
Let

X
i
denote the associated independent sequence of X
i
,

G the limiting distribution of the
maximum of

X
i
, and

l its stable dependence function:
Pr {M
n
u
n
} G(x) as n ,
Pr
_

M
n
u
n
_

G(x) as n .
43
Let x() be such that
d
= log

G
d
(x
d
) =
1
d
log G
d
(x
d
). Let x
n
() R
D
such that
x
n
() x(), and write u
n
() = a
n
x
n
() + b
n
. The multivariate extremal index is de-
ned as follows.
Denition 2.23 (Nandagopalan (1994)). The extremal index of the sequence X
n
is
() =
log G(x())
log

G(x())
.
The multivariate extremal index function can be expressed also with the stable tail depen-
dence function as
() =
l()
l()
,
where = (
1
1
, . . . ,
D
D
) denotes componentwise multiplication. The denition parallels
that of the extremal index in the univariate case, since G(x()) =

G(x())
()
. The properties
of the multivariate extremal index can be summarized as follows (Smith and Weissman, 1996):
(T1) () is continuous in [0, ];
(T2) (s) = () for 0 < s < and [0, ] \ {0};
(T3) (e
d
) =
d
for d = 1, . . . , D, where e
d
is the dth unit vector in R
D
;
(T4) 0 () 1 for [0, ];
(T5) If G has unit Frechet margins, then the value () of the extremal index function at is
equal to the univariate extremal index of the sequence W
i
() = max
d
d
X
id
.
Properties (T1)(T3) follow from the properties of the stable tail dependence function. The
homogeneity property (T2) implies that () is constant along rays in [0, ], so the informa-
tion comprised in the extremal index function is contained in the restriction of () to the
unit simplex S
D
in R
D
. Property (T5) can be seen by considering the probabilities of events
max{W
1
(), . . . , W
n
() n} and max{

W
1
(), . . . ,

W
n
() n}, and taking the limits along
threshold sequences u
nd
= n/
d
, where

W
i
is based on the associated independent sequence

X
d
.
These properties are not sucient to characterize the extremal index function. Martins
and Ferreira (2005) and Ehlert and Schlather (2008) showed that the extremal index function
must satisfy a number of additional conditions, imposed by the properties L3L4 of the stable
tail dependence function. Ehlert and Schlather (2008) give sharp upper and lower bounds as
44
functions of the univariate extremal index values and either the extremal coecient

or the
adjusted extremal coecient = l(1), where 1 is the componentwise product of and 1.
The interpretations of the multivariate extremal index are similar to those of the univariate
extremal index. Under condition D(u
n
()), for a sequence of integers r
n
= o(n) chosen the
same way as in the univariate case, we have
()
1
= lim
n
E
_
rn
i=1
I(X
i
u
n
())
rn
i=1
I(X
i
u
n
()) 1
_
, (2.16)
which corresponds to the interpretation of the extremal index as the inverse expected limiting
cluster size, and
() = lim
n
Pr
_
max
i=2,...,rn
X
i
u
n
() | X
i
u
n
()
_
, (2.17)
yielding its interpretation as the conditional probability of a series of r
n
non-exceedances fol-
lowing an exceedance.
Estimation methods do not yet exist for the multivariate extremal index. From the point of
view of estimation, property (T5) is extremely useful: it reduces the problem of estimating the
multivariate extremal index function to a collection of univariate estimation tasks. We choose
a ne enough grid G = {
1
, . . . ,
N
} on S
D
, dene the sequence W
i
() for all G, and
perform univariate estimation by any one of the univariate methods. This involves not only
the same problems of the threshold, run parameter or block size selection as in the univariate
case, but adds a new one: the suitable choice may be dierent from gridpoint to gridpoint.
These methods are likely to give coarse estimates on the simplex, varying violently between the
gridpoints and subject to dierent bias at dierent locations. Moreover, the estimates will not
satisfy the constraints on the extremal index.
2.2.3 The M4 model
The M4 approximation was proposed originally by Smith and Weissman (1996), based on earlier
work of Deheuvels (1983). In his article about the links between representations of point pro-
cesses and multivariate extreme-value distributions, P. Deheuvels proves the following theorem:
Theorem 2.24. Suppose that (T
1
, . . . , T
D
) follows a joint multivariate extreme-value distri-
bution for minima with unit exponential margins, and let {V
i
} be an independent, identically
distributed sequence of unit exponential variables. Then there exist D sequences c
(n)
dk
> 0, k Z,
45
depending on n, such that the joint distribution of the variables
T
(n)
d
= min
kZ
{c
(n)
dk
V
k
},
dened for d = 1, . . . , D, converges weakly to the joint distribution of (T
1
, . . . , T
D
) as n :
_
T
(n)
1
, . . . , T
(n)
D
_
d
(T
1
, . . . , T
D
) .
Dening Y
d
= T
1
d
, d = 1, . . . , D in the above theorem, assuming that the independent
variables Z
i
follow a unit Frechet distribution and setting Y
(n)
d
= max{
(n)
dk
Z
k
}, < k <
with
(n)
dk
=
_
c
(n)
dk
_
1
, we get the corresponding theorem for maxima with Frechet margins.
From now on, we use the consequent approximation
Y
id
= max
kZ
idk
Z
k
, i = 1, . . . , n, d = 1, . . . , D, (2.18)
where
idk
are nonnegative constants and the variables {Z
k
} are independent unit Frechet.
For this approximation, Smith and Weissman (1996) (adapting the arguments of Deheuvels)
prove that from stationarity, there follows the existence of a bijective mapping that maps the
coecients
idk
to each other, and by acting so, gives rise to equivalence classes of the coecients.
Theorem 2.25. Suppose that the sequence Y
i
= (Y
id
, d = 1, . . . , D) is dened by equation (2.18),
and is stationary. Assume that
(i) for each pair (i, d) the constants {
idk
, k Z} are distinct;
(ii) there exist a sequence 1 < n
1
< n
2
< . . ., increasing to innity, such that the values of
_

idk
n
k
j=1
D
r=1
jrk
, i = 1, . . . , n, d = 1, . . . , D
_
are distinct for every n
k
.
Then there exists a bijective mapping v : Z Z such that for any i = 1, . . . , d = 1, . . . , D
and k Z,
idk
=
0dv
i
(k)
.
We can thus dene the equivalence relation p q between two integers p and q if there exists
i Z such that p = v
i
(q). This partitions Z into equivalence classes, and induces also a partition
of the coecients
idk
. By reordering within each equivalence class, a new representation of the
process can be obtained, called the RS class.
46
Theorem 2.26. If Y
i
is given by equation (2.18) with
idk
=
0dv
i
(k)
for some bijection v, and
if
k

0dk
< for each d, then there exists a decomposition
Y
id
= max{R
id
, S
id
}, d = 1, . . . , D, (2.19)
with
R
id
= max
lI
max
kZ
a
lkd
Z
l,ik
, S
id
= max
lF
max
k{0,...,N
l
}
b
lkd
Z
l,ik
,
where I and F are two subclasses of indices l, the Z
l,j
and Z
l,j
are mutually independent unit
Frechet variables, and Z
l,i+m(N
l
+1)
= Z
l,i
for all m Z and for all l F.
The constants a
lkd
and b
lkd
are nonnegative, and normalizing them so that
l,k
a
lkd
+
l,k
b
lkd
= 1 for all d = 1, . . . , D will ensure unit Frechet margins of X
i
. The R
i
component
of the process is a random process, whereas the S
i
component is completely determined by the
lter coecients b
lkd
once Z
l,i
is given for i = 1, . . . , N
l
and l Z
+
.
The spectral representation due to de Haan (1984) and de Haan and Pickands (1986) yields
an alternative derivation of the M4 representation.
Theorem 2.27. A D-variate max-stable process X
i
can be represented as
X
id
= max
j1
f
id
(S
j
)U
j
, i Z, d {1, . . . , D}, (2.20)
where the points (U
j
, S
j
) are the points of a Poisson process on R
+
[0, 1] with intensity du/u
2
ds, and {f
id
} for all d are sequences of deterministic spectral functions.
The RS class (2.19) can be found as a discrete and stationary version of this representation,
though the original introduction follows the arguments by Deheuvels (1983) and Smith and
Weissman (1996), summarized above. The dependence functions of the RS class are given by
l{} =
lZ
+
max
kZ
max
d{1,...,D}
a
lkd
d
, (2.21)
l() =
lZ
+
kZ
max
d{1,...,D}
a
lkd
d
+
lZ
+
k{1,...,N
l
}
max
d{1,...,D}
b
lkd
d
, (2.22)
and the extremal index function by
() =
lZ
+ max
kZ
max
d{1,...,D}
a
lkd
lZ
+
kZ
max
d{1,...,D}
a
lkd
d
+
lZ
+
k{1,...,N
l
}
max
d{1,...,D}
b
lkd
d
. (2.23)
47
The sequence X
i
is called an M4 process if it is an RS process with no deterministic part.
Then it can be written in the form
X
id
= max
lZ
+
max
kZ
a
lkd
Z
l,ik
, d = 1, . . . , D, (2.24)
and its extremal index is given by
() =
lZ
+ max
kZ
max
d{1,...,D}
a
lkd
lZ
+
kZ
max
d{1,...,D}
a
lkd
d
. (2.25)
In the context of the M4 processes, we will call the constants a
lkd
lter matrices or lter
coecients, and the underlying generating variables Z
l,i
shock variables. The usefulness of the
M4 processes lies in the fact that they can be used as approximations to the temporal dependence
structures of general stationary multivariate max-stable processes.
Theorem 2.28 (Smith and Weissman (1996); Ehlert and Schlather (2008)). The multivariate
extremal index of any stationary max-stable process Y
i
can be approximated uniformly by the
extremal index function of an M4 process.
An estimation method can be built on this model. In order to estimate the multivariate ex-
tremal index of a given data set, as a rst step, all the marginal sequences have to be transformed
to unit Frechet margins. Then the constants a
lkd
must be estimated, after which the extremal
index function can be calculated. The identication of the constants can be based on the fact
that as we investigate exceedances above high thresholds, the neighborhoods of the extremes
resemble distinct patterns from a (possibly innite) collection. Zhang and Smith (2004) prove
that if there are L shock sequences generating the process Y
i
, and the lter coecients are such
that a
lkd
= 0 only for k {1, . . . , K}, then
Pr
_
Y
t+m,d
K
i=1
Y
t+i,d
=
a
lmd
K
i=1
a
lid
innitely often
_
= 1.
This implies the repeated occurrence of the same patterns at high thresholds. The patterns
are called signatures, since there is one distinct pattern corresponding to each generating shock
sequence. It should be noted that the M4 process in this application is an approximating model
to general multivariate distributions, and it is not necessary to seek for real-life interpretations
of the shock variables, though there might be special cases where such interpretations are useful.
Heernan et al. (2007) considers possible extensions of the model to include asymptotic
independence. In this form, the M4 process can only approximate asymptotically dependent
48
2.3. OVERVIEW OF THE LITERATURE
sequences. Real data can exhibit asymptotic independence, which is dicult to estimate: t-
ting an asymptotically dependent model at moderately high threshold tends to overestimate
the dependence when extrapolating to very high levels. In order to model such cases, Hef-
fernan et al. (2007) introduce two dierent extensions of the M4 model. The rst consists of
the introduction of shock variables following the GEV distribution, and it is proven that us-
ing Gumbel- or Weibull-distributed variables, we can approximate asymptotically independent
random sequences too. The second is an extension by taking
Y
i,d
= max
_
U
1/
i,d
, max
l
max
k
a
lkd
Z
l,ik
_
,
where a
lkd
are nonnegative constants, and Z
l,i
and U
i,d
are independent arrays of independent,
identically distributed positive random variables. The range from asymptotic independence
and near-independence to asymptotic dependence can be modeled considering Frechet and unit
exponential shock variables.
The thesis of Zhang (2002) and a paper of Zhang (2008) propose ways of estimating max-
stable processes based on the M4 approximation, mainly by nding frequently occurring pro-
portions between successive observations in the series. This work proposes another approach:
instead of using characteristics that are easily masked by noise, such as proportions of successive
observations, we base the estimation procedure on similarities between whole neighborhoods of
extremes. The extremes behave as signposts of segments that are most likely to be approx-
imated well by the M4 model. We perform a preliminary neighborhood selection around the
exceedances of a high threshold, and then model these clusters in a semiparametric way.
2.3 Overview of the literature
The goal of this thesis is to develop exible estimation methods for clustering of extreme events.
Since the progress of extreme-value statistics is rapid, and new applications bringing new pro-
cedures and methods appear each year, the subject is too vast to be presented in a few dozen
pages. Therefore, I chose to present here only those parts which are directly relevant to the uni-
and multivariate extremal index or the proposed new methodology, and a lot of other important
material was completely omitted. Here follows a short summary of the omitted literature.
The interest in extremes of a random sample is almost a century old. The rst character-
ization of a limit distribution for the sample maximum was given by Frechet (1927), and the
three possible nondegenerate distribution functions for the suitably normalized maximum of in-
49
dependent, identically distributed variables appeared in Fisher and Tippett (1928). Gnedenko
(1943) gave rigorous conditions and a proof for the extremal types theorem. An elegant, simpler
derivation is due to de Haan (1970, 1976). Aspects of tail behaviour and regular variation are
discussed also in these papers, and in general, in the works of de Haan between 19701976,
see for example de Haan (1970, 1974, 1976). Applications nowadays are mostly based on the
maximum likelihood principle; the asymptotic properties of the maximum likelihood estimators
for the GEV distribution (together with a broader class of nonregular models) was discussed
by Smith (1985). Other estimation procedures include the probability weighted moments and
order statistics methods (de Haan, 1990; Hosking et al., 1985).
Threshold methods, using the generalized Pareto distribution as a conditional model of
excesses above a threshold, are widely used, mostly because they use data more parsimoniously
than the GEV model, and therefore give more reliable estimates. The model was introduced by
Pickands (1975). Estimation methods were developed and summarized in Davison and Smith
(1990).
The eld found its extensions to the serially dependent case, for example by Berman (1964)
and Loynes (1965), who introduced the concept of mixing sequences as the rst notion of asymp-
totic independence in stationary sequences. Leadbetter (1974) gave form to the notion of asymp-
totic independence which became a cornerstone in the construction of extreme-value statistics.
Later work included for example Rootzen (1986), OBrien (1987), de Haan et al. (1989), Smith
(1992) and Perfekt (1994) discussing extremes of various processes; Robinson and Tawn (2000)
on the eect of dierent sampling frequencies on the estimation of the extremal index; and
Ledford and Tawn (2003) on diagnostics of time dependence in univariate time series. Esti-
mation with covariates was considered for example by Davison and Smith (1990). Inference in
a nonstationary case by combined maximum likelihood methods and smoothing techniques is
discussed by Davison and Ramesh (2000), Hall and Tajvidi (2000a) and Chavez-Demoulin and
Davison (2005). The point process approach was rst proposed by Pickands (1971), and was
developed by M. R. Leadbetter, T. Hsing and J. H usler among others (Leadbetter (1983), Hsing
(1987) and Hsing et al. (1988)). The summary given in Section 2.1 was put together mostly
from Resnick (1987), Leadbetter et al. (1983), Embrechts et al. (1997) and papers of Leadbetter
(1983), Leadbetter and Nandagopalan (1989), Hsing et al. (1988) and Chernick et al. (1991).
More details on random measures and point processes can be found for example in Kallenberg
(1983), Cox and Isham (1980) and Daley and Vere-Jones (2003).
50
2.3. OVERVIEW OF THE LITERATURE
Early references on multivariate extreme-value statistics are Sibuya (1960) and Gumbel
(1960). Balkema and Resnick (1977) identied the multivariate extreme-value distributions as a
subclass of the max-innitely divisible distributions. Galambos (1978) and Resnick (1987) both
contain summaries of multivariate extreme-value statistics. In recent years, advances in this eld
have been vast. Parametric models for the bivariate case were developed among others in Tawn
(1988), Coles and Tawn (1991) and Joe et al. (1992); these last two also include techniques based
directly on the point process approach. Various other representations, including a few useful in
spatial and spatio-temporal problems, appear in Hall and Tajvidi (2000b), Schlather and Tawn
(2003), Heernan and Tawn (2004), and Boldi and Davison (2007). Asymptotic independence
is investigated in a number of papers of Ledford and Tawn (1996, 1997, 1998) and Ramos and
Ledford (2008). Another approach, considering the multivariate extension of the generalized
Pareto distribution, is summarized in Falk et al. (2004).
The rst book devoted to extreme-value statistics was Gumbel (1958), and the next, rela-
tively recent monograph was Galambos (1978). It was followed by a number of important works
that summarize extreme value theory and its applications from various points of view, such
as Leadbetter et al. (1983), Resnick (1987), Embrechts et al. (1997), Coles (2001), Falk et al.
(2004), Beirlant et al. (2004), de Haan and Ferreira (2006) and Balkema and Embrechts (2007).
51
52
Chapter 3
Univariate methods
This chapter presents the rst original contribution of the thesis to extreme-value statistics,
the proposal of likelihood methods in the estimation of the clustering of extremes in univariate
sequences. Their introduction opens the way to the assessment of the eect of nonstationarity,
or including explanatory variables into models to explain part of the observed variability in the
clustering of extremes.
In Section 3.1, we introduce the maximum likelihood estimator for the extremal index in
univariate time series, and examine its validity. We discuss the issue of model misspecication
and the asymptotic properties of the likelihood model. The tests developed are sensitive to mis-
specication from any source: they can help select threshold and run parameters, and inform us
about problems with the fundamental assumptions of the model. The use of the new diagnostic
methods is presented and their performance is assessed on simulated examples.
Section 3.2 shows the use of the likelihood methods on nonstationary data examples. First a
simple application of a local linear t is presented, using the Central England temperature series.
The second application puts more emphasis on the use of the misspecication tests, which turn
out to give a detailed picture about possible diculties in the modelling of the extremes of a
daily summer temperature sequence from Neuchatel.
53
CHAPTER 3. UNIVARIATE METHODS
3.1 Likelihood methods
3.1.1 The likelihood and model misspecication
For a sequence of thresholds u
n
, dene the inter-exceedance time in the stationary sequence
{X
i
} by
T(u
n
) = min{n 1 : X
n+1
> u
n
| X
1
> u
n
},
and the corresponding K-gap by
S
(K)
(u
n
) = max {T(u
n
) K, 0} , K = 0, 1, . . . .
With the notation F
i,j
(u
n
) for the -eld generated by the events X
k
u
n
, i k j, we
re-state here the asymptotic independence condition of Section 2.1.5:
Condition
(u
n
) For any A F
1,k
(u
n
) with P(A) = 0, B F
k+l,crn
(u
n
) and 1 k cr
n
l,
|P(B | A) P(B)|
(cr
n
, l), c > 0,
and there exists a sequence l
n
= o(n) for which
(cr
n
, l
n
) 0 as n for all c > 0.
Then we can modify Theorem 2.19 from Section 2.1.5 in the following way:
Theorem 3.1. If there exist sequences of integers {r
n
} and of thresholds {u
n
} such that
(i) r
n
, r
n
F(u
n
) and P {M
rn
u
n
} e
as n for some (0, ) and

[0, 1], and
(ii) condition
(u
n
) is satised,
then as n ,
P
_
F(u
n
)S
(K)
(u
n
) > t
_
exp(t) for t > 0 (3.1)
for K xed.
This theorem is a straightforward extension of Theorem 1 of Ferro and Segers (2003); it was
given for K = 1 in S uveges (2007). The proof requires only small modications compared to
the original theorem, by considering P{F(u
n
)(T(u
n
) K) > t}; it is not repeated here.
The limit distribution given above is easy to put to work in practice. Suppose that we
observe a random sequence X
1
, X
2
, . . . , X
n
that satises
(u
n
). Suppose that N observations
54
3.1. LIKELIHOOD METHODS
exceed the threshold u
n
, let the collection of indices {j
i
: X
j
i
> u
n
} denote the locations of the
exceedances, let the inter-exceedance times be T
i
= j
i+1
j
i
, and let S
(K)
i
denote the ith K-gap,
that is, S
(K)
i
= max(T
i
K, 0), i = 1, . . . , N 1 and K = 0, 1, . . .. Assuming independence of
the S
(K)
i
, we can write the likelihood in the form
K
_
; S
(K)
i
_
= (N 1 N
C
) log(1 ) + 2N
C
log
N1
i=1
F(u
n
)S
(K)
i
, (3.2)
where N
C
=
N1
i=1
I(S
(K)
i
= 0). From this likelihood, a closed-form maximum likelihood
estimator

n
can be derived as the smaller root of a quadratic equation.
The form (3.2) agrees with the compound Poisson model. The parameter K is the largest
inter-exceedance time which is considered as a within-cluster time, and inter-exceedance times
longer than K are treated as independent inter-cluster periods. Accordingly, the number N
1 N
C
of the zero K-gaps (or equivalently, of the inter-exceedance times no longer than K)
represents the number of exceedances additional to the leading exceedances of the clusters. For
an independent process X
t
, this would be zero in the limit, since there the extremes occur
individually, not in groups. The exponential part describing the distribution of the inter-cluster
distances refers to the truncated variables T
i
K = S
(K)
i
, instead of the inter-exceedance times
T
i
. The extremal dependence in the process X
i
blurs the extreme events: every exceedance
seems to have a nonzero extent of length K, so a cluster extending from time j
1
to time j
2
in the observed series has an eective length j
2
j
1
+K, and this additional length must be
subtracted from the inter-cluster distances.
The choice of the truncation parameter K is therefore crucial; it is, in fact, the choice of the
run parameter, and corresponds to the supposed inuence range of the serial dependence. Too
small a value of K implies a serious model misspecication by wrongly imposing independence of
gaps that are in fact dependent within-cluster distances. Too large K causes bias by assigning
too many inter-exceedance times to the point mass part of the mixture distribution, and by
truncating too strongly the inter-cluster K-gaps in the exponential part, though it does not imply
misspecication due to overlooking dependence. Considering the eects of the misspecication
in the likelihood is therefore imperative.
3.1.2 Asymptotic properties
Beginning already in the sixties, the statistical literature dealing with econometrics extensively
considered problems linked to model misspecication. A full treatment of this problem can be
55
found in White (1994); the book is a detailed account of general issues and many special cases
of misspecication, generally in models used in nance and econometrics.
Model misspecication should be treated as a crucial issue in extreme-value statistics, too.
We use models that are valid only in the special setting of n , u
n
tending to the right
endpoint x
R
of the distribution, and we t them to a sample of nite size, above thresholds
that are in many cases not so close to x
R
. When trying to t the above-described truncated
exponential model to the sequence of gaps between exceedances, we are faced with an additional
possible source for misspecication: a wrong selection for the value of the run parameter K.
This leads to a wrong assumption of independence of the sample. In these circumstances, to
where does the estimator derived from the misspecied likelihood converge? How should we
do inference in such models? How should we detect misspecication? Is there a way to nd
combinations of threshold and run parameter so that we are as close to a well-specied model
as possible?
White (1982) provides the reply to these questions. Under broad assumptions, he proves that
the estimator derived from the misspecied likelihood exists. Moreover, when the true model is
not contained in the postulated model, it is consistent for that parameter value
within the
misspecied family that minimizes the KullbackLeibler discrepancy from the true distribution.
Dene for our scalar parameter case the quantities
J() = E
0
{
(, S
j
)
2
},
I() = E
0
{
(, S
j
)},
J
n
() = (N 1)
1
N1
j=1
(, S
j
)
2
,
I
n
() = (N 1)
1
N1
j=1
(, S
j
),
where the prime denotes dierentiation with respect to , and E
0
is the expected value with
respect to the true model. Under assumptions concerning dierentiability with respect to and
dominated integrability with respect to S
j
of the log-likelihood, and supposing that
is an
interior point in the parameter space and that I(
) is nonsingular, the restriction of Theorem

3.2 of White (1982) to a scalar case ensures that
n(
) N
_
0, I(
)
2
J(
)
_
,
56
and that
I
n
(
n
)
2

J
n
(
n
)
a.s.
I(
)
2
J(
),
where
a.s.
denotes almost sure convergence. This states that the estimator derived from (3.2)
using an arbitrary run parameter K is consistent for the value
which minimizes the Kullback

Leibler distance from the true distribution; moreover, it is asymptotically normally distributed
with the sandwich variance I(
)
1
J(
)I(
)
1
, which can be estimated by its empirical version
taken at

n
.
Let now K
0
= min{K : S
(K)
i
are independent random variables}, and suppose K
0
< . Let
0
be the true extremal index, and
(K)
the parameter value in the log-likelihood specied by

K that minimizes the KullbackLeibler discrepancy. The model is misspecied in any case
where K < K
0
, since we mistakenly assume independence for a subset of within-cluster gaps
too. This misspecication, on one hand, results in
(K)
possibly falling very far from

0
; the
expected bias depends on the short-range behaviour of the process X
i
, so knowledge about its
nite-dimensional probability distributions would be necessary to nd out more about it, and
it can dier sharply from one process to another. On the other hand, dependence can imply
strongly diering I(
(K)
) and J(
(K)
); this, too, will be obviously process-dependent.

3.1.3 Diagnostics for threshold and run parameter selection
Misspecication tests
Information matrix test
Tests to detect model misspecication can be based on the fact that for a well-specied
model, J() = I(). This relation can be expressed as
D() = J() I(), (3.3)
given in White (1982) as the basis for the Information Matrix Test. The test corresponding to
D for the equality of J() and I() reduces to the formulation
H
0
: D() = 0 against H
1
: D() = 0.
Dene the one-observation indicator version d(s
j
, ) for D(). Let moreover D
n
() = n
1
n
j=1
d(s
j
, ),
57
the nite-sample counterpart of D(). Finally dene
V () = E
_
[d(S
j
, ) +D
()I()
1
(, S
j
)]
2
_
,
which is the asymptotic variance of D(). Requiring continuous third derivative for the mis-
specied likelihood (S
j
, ), dominated integrability for D()
2
, |D
()| and |D()
(, S
j
)|, and
supposing that V () is nonsingular, White (1982) proves the following theorem (given here in a
version restricted to the scalar parameter case):
Theorem 3.2. If the assumed model (S
j
, ) contains the true model for some =
0
, then
(i)

nD
n
(
n
) N (0, V (
0
));
(ii) V
n
(
n
)
a.s.
V (
0
),
(iii) the test statistic T(
n
) = nD
n
(
n
)
2
V
n
(
n
)
1
is distributed asymptotically as
2
1
.
A procedure of testing for misspecication in the extreme-value context can therefore be
constructed using the test statistic T(
n
) testing for J() = I(). Calculating the function
D
n
(
n
) and its empirical variance V
n
(
n
), forming the test statistic nD
n
(
n
)
2
V
n
(
n
)
1
, and
comparing it to the quantile of the
2
1
-distribution at a desired level will inform us whether we
can accept hypothesis H
0
at that condence level or not.
The empirical J(
(K)
)/I(
(K)
) ratio
Writing
J = (N 1)
1
N1
j=1
(, S
j
)
2
= i() + (N 1)
1
N1
j=1
_
(, S
j
)
2
i()
and
I = (N 1)
1
N1
j=1
(, S
j
) = i() (N 1)
1
N1
j=1
_
(, S
j
) +i()
,
where i() is the expected information from one observation, by a Taylor series expansion and
under the assumption of independence we obtain
E{I
1
J} = 1 +O(N
1
)
and
var{I
1
J} = N
1
i()
2
_
var{
(, S)
2
} + var{
(, S)}
. (3.4)
58
To trace deviations from our basic assumptions in the specications of the likelihood, we
can exploit (3.4) too. It is straightforward to calculate the maximum likelihood estimates of the
extremal index for a range of run parameters, replace the quantities in (3.4) by their empirical
values and substitute the estimate

(K)
for . Then a plot of the ratio J(
(K)
)/I(
(K)
) versus K,
together with its approximate condence band around its expected value 1 under independence,
will provide us information on whether we are wrong or right in assuming this for a given K:
if the ratio is outside the condence bands, that hints at the independence assumption being
wrong. On the other hand, a near-constant ratio close to 1 for a range of K suggests that in this
range we are not mistaken in the independence assumption, and these run parameter values are
acceptable.
The other main source of model misspecication is bad threshold choice, which is equivalent
to invalid extreme-value approximation. Both the tests described so far will in practice detect
every misspecication, either originating in bad threshold selection or in bad run parameter
choice, without being able to distinguish between the origins. It is best therefore to calculate
the test statistics for all pairs of a range of thresholds and run parameters, and plotting the
ratios J(
(K)
)/I(
(K)
) and the test statistic T(
(K)
) as a surface above the plane u, K: surfaces
wobbling badly or falling far from the ideal values (0 for the test T(
(K)
), 1 for the ratio) warn
us to avoid these threshold-run parameter combinations.
Condition D
(K)
(u
n
)
Still another test for the choice of the run parameter K can be based on Condition D
(K)
(u
n
)
(Chernick et al., 1991), presented in Section 2.1.1. It ensures that the probability of exceeding
the high threshold u
n
again after a run of at least K non-exceedances tends to zero. By the
discussion at the end of Section 2.1.4, it provides a sucient condition for the adequate modelling
of clusters by such exceedance sequences in which the exceedances are separated with less than
K non-extreme observations.
In practice, condition D
(K)
(u
n
) may be checked by calculating
p(u, r) =
n
j=1
I(X
j
> u, M
j+1,j+K
u, M
j+K+1,r
> u)
n
j=1
I(X
j
> u)
(3.5)
for the observed sequence X
1
, . . . , X
n
, where I is the indicator function. Given u and r, we can
compute the proportion of the anti-D
(K)
(u
n
) events {M
j+1,j+K
u, M
j+K+1,r
> u | X
j
> u}
among the exceedances for a range of thresholds and block sizes; D
(K)
(u
n
) is satised if there
59
exists a path (u
i
, r
j
) with u
i
and r
j
, for which p(u
i
, r
j
) 0.
3.1.4 The iterative least squares estimator
We can dene another estimator by exploiting the fact that, as was noted by Ferro (2003),
the standard exponential quantile-quantile plot of F(u
n
)S(u
n
) can be tted by a broken-stick
model. One segment is composed of zeros and the other is a line with slope
1
, intersecting
each other at (log , 0). In practice, we have found that the best way to use this structure is
the following iterative weighted least squares procedure:
(1) Calculate the sequence of 1-gaps, and select those that are nonzero. Denote them by {s
i
}.
Scale them by their average and order them to make a sample {
i
: i = 1, . . . , N
C
}, to
compare to standard exponential quantiles x
i
= log{1 i/(N
C
+ 1)} for i = 1, . . . , N
C
.
(2) Fit a weighted least squares model to the points {x
i
,
i
} with weights w
1
i
=
N1
j=Ni
j
2
to obtain the estimated intercept and slope

.
(3) Find an estimate of the extremal index by

LS
= min
_
exp
_
/
_
, 1
_
.
(4) Choose the largest
LS
(N 1) spacings from the original collection {s
i
}, and compare
them to the original collection {s
i
} used to obtain

LS
. If they are dierent, then it is
possible that the new collection of 1-gaps yields a dierent extremal index estimate. Apply
steps (2)(4) for the points {(x
i
,
i
) : i N
LS
(N 1)}. If the two sets of 1-gaps are
identical, accept

LS
obtained in (3) as the estimated extremal index.
This estimator works by nding a self-consistent estimator for : a rst guess for is given
by using the exponential nature of the inter-cluster distances, then, making use of the fact that
a proportion of 1 of the observed exceedances should be secondary cluster elements, and
so represented by within-cluster gaps among the gap set, it determines a new set of inter-cluster
gaps. Then a second estimate is given, based on the exponential distribution of the new gap set.
Alternating the exponential ts and the selection of the new gap set according to the estimate,
the iteration is continued until it reaches a stable point, where neither the estimate nor the set
of gaps changes further. The extremal index has a double role: rst by determining the mean
cluster size, second, by determining the distribution of the inter-cluster times. This procedure
uses one of them to derive an estimate, and the other, to check the quality of the estimate and
to determine a new data set to which a more precise estimate can be based. The nal value is
the estimate where these two agree.
60

0.96
0.98

5
10
0.0
0.2
0.4

0.96
0.98

5
10
0.0
0.2
0.4

0.96
0.98

5
10
0.0
0.2
0.4

0.96
0.98

5
10
0.0
0.2
0.4
Figure 3.1: The observed proportion of anti-D
(u
i
) events for the oscillating AR(2) (upper left panel),
moving maximum process (upper right), Markov chain (lower left) and ARMA(1,4) (lower right). Fore-
ground horizontal axis: F(u), right-hand side horizontal axis: block size r, vertical axis: p(u, r).
3.1.5 Simulations
The use of D
(K)
(u
n
)
Examples for the use of the condition D
(K)
(u
n
) are given in Figure 3.1, for the special case
K = 1. The upper left panel refers to an AR(2) process with
1
= 0.93 and
2
= 0.86,
which does not satisfy condition D
(1)
(u
n
); the upper right panel shows a moving maximum
sequence max{Z
i3
, Z
i2
, Z
i1
, Z
i
}, Z
i
being an independent, identically distributed random
sequence, which satises D
(1)
(u
n
). The two lower panels contain a Markov process with sym-
metric logistic joint distribution with parameter r = 2 (Smith, 1992), and an ARMA(1,4) process
with innovations following a Pareto distribution, having parameters = 0.82,
1
= 0.11 and
i
= 0.13 for i = 2, 3, 4.
One aspect of the plots is the trend of the proportions. Finding a direction of decreasing
proportion p(u, r) with u x
R
and r on the plots amounts to nding that there is a
threshold where the contiguous cluster model will be a good approximation. On the other hand,
even if there is no direction with marked downward trend, an impression of the bias can be
gained by the proportion of anti-D
(1)
(u
n
) events, since these give information on how many
clusters are misidentied as two or more clusters; if such clusters are few, we might decide to
accept this relatively small upward bias despite the possible failure of D
(1)
(u
n
). The four panels
of Figure 3.1 illustrate these types of behaviour: for the AR(2) process, there is no direction of
61
downward trend, and we nd a very high bias with the 1-gaps estimator; for the moving maxima
sequence, D
(1)
(u
n
) is satised and the 1-gaps maximum likelihood estimator works excellently;
for the ARMA(1,4) sequence, there seems to be a very low proportion of anti-D
(K)
(u
n
) events
independently of (u, r), and the maximum likelihood estimator using 1-gaps yields good results;
and for the Markov chain model, the proportion of anti-D
(1)
(u
n
) events is high enough to cause
a bias of around 20% in the maximum likelihood estimator

(cf. Figure 3.2). The AR(2) and
the Markov chain will be reconsidered among the applications of the misspecication tests.
We compared the performances of the 1-gaps maximum likelihood and the iterative least
squares estimators to the intervals, runs, and two-threshold estimators on the ARMA(1,4) and
the Markov chain examples. For each process, R = 500 independent repetitions of sequences
with lengths n = 2000 and 30, 000 were generated, and estimation was performed on the 500
repetitions using the dierent estimators with four dierent thresholds corresponding to em-
pirical exceedance probabilities 0.95, 0.975, 0.9825 and 0.99. The variance and bias of the es-
timators were assessed by calculating the root mean squared error and relative median bias
B
est
=
_
median
i=1,...,R
{
i
}
_
1
for all estimators and plotting them against thresholds.
The results can be summarized as follows:
The simulations justied the role of condition D
(1)
(u
n
): for any process that satises this
condition, the maximum likelihood estimator using 1-gaps performs well in terms of both
bias and of mean squared error. The results for two processes are shown in Figure 3.2; these
are the ARMA(1,4) and the Markov chain, for which the diagnostic plots to check D
(1)
(u
n
)
are presented in Figure 3.1. For the ARMA(1,4), D
(1)
(u
n
) is satised. For the Markov
chain, K > 1 seems to be a better choice. This will be checked by the misspecication
tests in the next subsection.
Like the intervals estimator, the iterative weighted least squares estimator does not require
the selection of a run parameter, and its performance is very close to that of the intervals
method. As the iterative least squares method uses only the longer separations for the
estimation, its variance is larger for short series with high thresholds. The simulations
showed it to be less sensitive to the threshold choice: its behaviour changed little for any
threshold in the range F(u
n
) 0.05.
An inherent problem with the likelihood (3.2) is that it is not regular at = 1; the
independence cannot be nested into a general model and tested for by standard methods.
62
0.95 0.96 0.97 0.98 0.99
0
.
0
2
0
.
0
6
ARMA(1,4)
R
M
S
E
0.95 0.96 0.97 0.98 0.99
0
.
0
2
0
.
0
6
Markov chain
0.95 0.96 0.97 0.98 0.99
0
.
2
0
.
0
0
.
2
F(u)
R
e
l
a
t
i
v
e

b
i
a
s
0.95 0.96 0.97 0.98 0.99
0
.
2
0
.
0
0
.
2
F(u)
Figure 3.2: Comparison of the 1-gaps maximum likelihood (solid) and iterative least squares (dashed)
estimators to the intervals (dotted), 2-threshold (dash-dotted) and runs estimators (long-dashed) on an
ARMA(1,4) process (left panels) and on a Markov chain (right panels). The top row shows the root
mean squared error, the bottom row the relative median bias. For the 2-threshold method, run length 8
and lower threshold F(l
n
) = 0.5 were chosen. For the runs method, run length 8 was used. The number
of observations was n = 30, 000.
We can extend the model by dening the estimate as

= 1 when N
C
= N 1; in this
case, bootstrap methods can inform us if independence is plausible for a given data set.
Additional simulations of independent processes with = 1 conrmed this.
The use of misspecication tests
We present the functioning and the performance of the tests based on the information matrix
equalities on three processes:
AR(1): Y
i
= Y
i1
+Z
i
with = 0.7 and Z
i
following standard Cauchy;
AR(2): Y
i
=
1
Y
i1
+
2
Y
i2
+Z
i
with
1
= 0.95,
2
= 0.89 and Z from a Pareto distribution
with tail index 2, used already at the beginning of this section to demonstrate the failure
of condition D
(1)
(u
n
);
Markov chain: with symmetric logistic bivariate distribution for consecutive variables and with
dependence parameter r = 2 (Smith, 1992), another example for the use of condition
63
0 50 100 150 200
1
0
1
0
AR(1)

Time
y
0 50 100 150 200
1
0
0
0
1
0
0
AR(2)

Time
y
0 50 100 150 200
2
0
2
4
Markov chain
Time
y
I
/
J
0
.
7
1
.
1
F(u) = 0.95
I
/
J
0
.
6
1
.
4
I
/
J
0
.
7
1
.
1
2 4 6 8 10 12
0
.
2
0
0
.
3
5
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
6
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
4
0
.
6
K
E
s
t
i
m
a
t
e
I
/
J
0
.
7
1
.
1
F(u) = 0.98
I
/
J
0
.
6
1
.
4
I
/
J
0
.
7
1
.
1
2 4 6 8 10 12
0
.
2
0
0
.
3
5
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
6
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
4
0
.
6
K
E
s
t
i
m
a
t
e
Figure 3.3: Three simulated examples, an AR(1) (left column), an AR(2) (middle column) and a Markov
chain (right column). The top row gives an impression of each series, together with the 0.95-quantile
(dotted line). The second and third rows show the estimated ratios I(
(K)
)/J(
(K)
) and the estimated
extremal index

K
(heavy solid lines) as a function of the run parameter for a threshold u with F(u) = 0.95
with standard errors (thin solid). The dotted lines show the target value. The fourth and the fth rows
give the ratio and the estimates for F(u) = 0.98.
64
D
(1)
(u
n
).
The AR(1) model satises condition D
(1)
(u
n
), that is, the appropriate run parameter is
K = 1; the AR(2) model satises D
(6)
(u
n
), with K = 6. We generated series of length n = 8000
of each, and using thresholds corresponding to the 0.95- and the 0.98-quantiles, the sequences
of inter-exceedance times were calculated. Then, based on a sequence of run parameters K =
1, . . . , 12, we calculated the maximum likelihood estimates and the proportion J(
(K)
)/I(
(K)
)
for each K.
The top row of Figure 3.3 shows a short sample with a typical extreme cluster of each
series, together with their thresholds corresponding to the 0.95-quantile. The middle pair of
rows refers to a threshold choice u with F(u) = 0.95; the upper one presents I(
(K)
)/J(
(K)
)
and its empirical standard error, and the lower one the estimated extremal index with its 95%
condence interval, both as a function of K. The two bottom rows show the same quantities for
a threshold choice of F(u) = 0.98.
The plots show the inuence of the misspecication on the information sandwich. For the
AR(1) process, where modeling clusters with contiguous exceedance sequences is a good choice,
the observed ratio I(
(K)
)/J(
(K)
) remains within the empirical condence bands, though it
seems to suggest the choice of a lower threshold with K = 2 or 3: the ratio is closest to
1 here. Accordingly, the estimated extremal index is quite close to the true value, which is
contained within the condence band. Experience on simulations and data examples gave the
impression that although the correct statistical procedure suggests to accept that all models are
well specied for which the test statistics are within the acceptance region, the pairs (u, K) with
test statistics very close to their ideal values yield generally the best estimates. Also, if these
pairs are numerous, and form a continuous region in the (u, K) plane, the estimates for all pairs
in the region are very similar, so that we can choose the smallest threshold and run parameter.
The case changes for the AR(2) process: here not only is the ratio I(
(K)
)/J(
(K)
) far from
1, but it is well out of the empirical error bars until misspecication disappears at K = 6. At
that point, the sandwich factor drops to the value under independence of the variables, then
both it and the estimate of the extremal index remain stable.
Another possibility appears in the case of the Markov chain. Here, although
I(
(K)
)/J(
(K)
) is at least marginally within the condence interval, a steady decrease at the
beginning and then a stabilization hints at misspecied independence for small K. At K values
where the ratio becomes approximately stable, so too does the estimated extremal index.
65
0.95 0.96 0.97 0.98 0.99
0
.
0
0
0
.
0
2
0
.
0
4
0
.
0
6
AR(1)
R
M
S
E
0.95 0.96 0.97 0.98 0.99
0
.
0
0
0
.
0
2
0
.
0
4
0
.
0
6
AR(2)
0.95 0.96 0.97 0.98 0.99
0
.
0
0
0
.
0
2
0
.
0
4
0
.
0
6
Markov chain
0.95 0.96 0.97 0.98 0.99
0
.
0
5
0
.
0
5
F(u)
R
e
l
a
t
i
v
e

b
i
a
s
0.95 0.96 0.97 0.98 0.99
0
.
0
5
0
.
0
5
F(u)
0.95 0.96 0.97 0.98 0.99
0
.
0
5
0
.
0
5
F(u)
Figure 3.4: Comparison of the K-gaps maximum likelihood (solid) and iterative least squares (dashed)
estimators to the intervals (dotted) estimator on the AR(1) process (left panels), on the AR(2) process
(middle panels) and on the Markov chain examples (right panels). K = 1 for the AR(1) process, K = 6
for the AR(2) process, and K = 5 for the Markov chain. The top row shows the root mean squared error,
the bottom row the relative median bias. The number of observations was n = 30, 000.
A simulation study was performed on 1000 repetitions of each of these processes, using the
K values suggested by the misspecication tests: K = 1 for the AR(1) process, K = 6 for the
AR(2), and K = 5 for the Markov chain. As with the 1-gaps maximum likelihood estimator, we
used simulated processes of sample size n = 2000 and n = 30000, and thresholds corresponding
the 0.95-, 0.96-, 0.97-, 0.98- and 0.99-quantiles. The median relative bias and the root mean
squared error for the case n = 30000 is shown in Figure 3.4. The plots conrm the impression of
the good properties of the iterative least squares and the maximum likelihood estimators, gained
from the study of 1-gaps. Using the right value of run parameter, the maximum likelihood
estimator has in general lower bias and root mean squared error than the intervals estimator.
The iterative least squares estimator, although it is clearly not a exible method adaptable to
nonstationarity and the use of covariates, also shows very good properties in terms of bias and
variance, and its use does not require the selection of a run parameter.
66
3.1.6 Local extremal index
Since one of our main motivations for constructing likelihood methods for estimation of extreme
clusters is the nonstable behaviour of the climate, we examine now how to apply them to detect
nonstationarity. Stationarity is only rarely observed in practice for long time series; a xed type
of parametric model with parameters varying with time more often yields a useful description
of the data. The point process description of the extremes of Section 2.1 and the likelihood
model presented in Section 3.1.1 oer a way to recognize problems related to nonstationarity:
likelihood models can easily be used together with smoothing methods.
Smoothing methods
The idea of semiparametric regression arose from such cases of data analysis where the variable
of interest, say Y
i
, depends on some covariates x
1i
, x
2i
, but the functional relationship between
the two does not admit a simple form, or is unknown. This is exactly our motivation to introduce
these methods into the study of the clustering characteristics of climatic extremes. With the
climate changing, we must admit the possibility that these are also varying with time. It is
evident also that climate physics cannot easily provide an exact form of this dependence on
time. Therefore, the methods borrowed from semiparametric regression can be very useful
in the analysis of clusters of climatic extremes. Topics of this methodology are described for
example in Fan and Gijbels (1996) or in Bowman and Azzalini (1997). The roughness penalty
approach, which is not used in this thesis and will not be discussed further, is summarized in
Green and Silverman (1994). A brief, practical account of both approaches may be found in
Davison (2003, Section 10.7).
Suppose initially that the response variable depends on a covariate in the following way:
Y
i
= g(x
i
) +
i
,
where g(x) is a smooth function with possibly unknown form, and the errors
i
are independent,
having zero mean and variance
2
. We want to estimate the function value g(x
0
) at the point
x
0
, based on the observed pairs {(x
i
, y
i
)}. It is desirable that observations far away from x
0
have weaker inuence on the estimate g(x
0
), so a kernel w(u) is introduced such that w(u) is
a unimodal density function with unit variance and is symmetric about u = 0. For a xed
parameter h, called the bandwidth, the weights attached to the dierent x
i
are given by
w
i
=
1
h
w
_
x
i
x
0
h
_
, i = 1, . . . , n.
67
Finally, suppose that around x
0
, the function g(x) can be approximated with a polynomial
0
+
1
(x x
0
) + . . . +
k
(x x
0
)
k
. Then to nd an estimate for the value g(x
0
), we need
to t the linear model y = X + with weight matrix W = diag(w
1
, . . . , w
n
), where y is the
response vector, = (
0
, . . . ,
k
), and X is the design matrix, of which the ith row is {1, (x
i

x
0
), . . . , (x
i
x
0
)
k
}. The corresponding least squares estimate is = (X
T
WX)
1
X
T
Wy, and
thus the estimate of the function value g(x
0
) is g(x
0
) =

0
.
The most commonly occurring choices in practice are the local constant and the local linear
polynomial forms. For both, the asymptotic bias and the variance can be written as
E{ g(x
0
)} g(x
0
) =
1
2
g
(x
0
),
var{ g(x
0
)} =

2
nhf(x
0
)
_
w(u)
2
du,
at points far from the boundaries of the design space. Here f(x
0
) is the limiting density of
design points, and the limit is taken so that n , h 0 and nh . Approximate
condence intervals can be based on the relation
g(x
0
) E{ g(x
0
)}
var{ g(x
0
)}
1/2
N(0, 1),
but since the asymptotic bias is nonzero, this must be corrected using a bias estimate.
The choice of the kernel in general does not have strong inuence on the t. More important
issues are the selection of the bandwidth h and the degree k of the polynomial. Near the
boundary of the design space, the bias of the local constant estimator increases to O(h), whereas
that of the local linear estimator remains O(h
2
), so it is better to choose the local linear estimator
for its nicer characteristics at the boundaries. A way to choose the bandwidth h is to minimize
the corrected Akaike Information Criterion.
The weighted local polynomial estimation can be formalized in the framework of the likeli-
hood approach. In this case, we maximize the local log-likelihood
(, ; x
0
, y) =
j
w
j
j
(, ; x
0
) (3.6)
with
j
(, ; x
0
) being the contribution of y
j
to the likelihood. The estimate

is obtained by
maximizing the likelihood in . As an example, take the model y
i
= g(x
i
) +
i
considered above,
and suppose that
i
N(0,
2
) for all i. In this case, the one-observation likelihood in a local
linear model will be
j
(, ; x
0
) =
1
2
2
{y
j

0
1
(x
j
x
0
)}
2
log .
68
When we want to estimate the extremal index of a possibly nonstationary sequence based on
the likelihood model (3.2),
K
_
; S
(K)
i
_
= (N 1 N
C
) log(1 ) + 2N
C
log
N1
i=1
F(u
n
)S
(K)
i
,
we take to have a locally constant form:
(t) =
0
,
or a transformed locally linear dependence on time:
(t) =
exp{
0
+
1
(t t
0
)}
1 + exp{
0
+
1
(t t
0
)}
,
where the transformation ensures (t) (0, 1). To obtain estimates for the extremal index at t
0
,
we maximize numerically the log-likelihood (3.6) for

0
in the local constant model or for

0
and
1
in the local linear model, and take

(t
0
) =

0
. To obtain condence intervals for the local
estimate

(t), we will use bootstrap methods instead of asymptotic normality. Due to the bias
of the estimator, condence intervals using asymptotic normality are less reliable and harder to
obtain, since we need an estimate of the bias.
Nonstationary simulations
We simulated ARMA(p, q) sequences of various extremal indexes, and attached them to form
processes with abruptly changing extremal characteristics. All the ARMA(p, q) sequences sat-
ised condition D
(1)
(u
n
), so that the maximum likelihood estimator built on 1-gaps behaved
well.
To compare the maximum likelihood, the intervals, and the iterative least squares estimators
in a nonstationary framework, we estimated the extremal index of the series with a smoothing
interval of length 7301 with thresholds corresponding to F(u
1
) = 0.95 and F(u
2
) = 0.98. For
every estimation method, was taken to be constant for the whole window. A weighted form
of the maximum likelihood estimator was used with weights derived from a truncated normal
kernel, and to estimate the variance of the method, we applied a parametric bootstrap consisting
of the simulation of homogeneous Poisson processes and nding the 95% pointwise percentile
intervals.
The intervals and the iterative least squares estimators allow only for the use of equal weights.
The same thresholds and the same window length were used as for the maximum likelihood
69
0 10000 20000 30000 40000 50000
0
.
0
0
.
2
0
.
4
0
.
6
F(u) = 0.95
Maximum likelihood
0 10000 20000 30000 40000 50000
0
.
0
0
.
2
0
.
4
0
.
6
F(u) = 0.98
Maximum likelihood
0 10000 20000 30000 40000 50000
0
.
0
0
.
2
0
.
4
0
.
6
Intervals
0 10000 20000 30000 40000 50000
0
.
0
0
.
2
0
.
4
0
.
6
Intervals
0 10000 20000 30000 40000 50000
0
.
0
0
.
2
0
.
4
0
.
6
IWLS
0 10000 20000 30000 40000 50000
0
.
0
0
.
2
0
.
4
0
.
6
IWLS
Figure 3.5: Comparison of the weighted MLE, intervals and iterative least squares estimators with
thresholds given by F(u
1
) = 0.95 (left panels) and F(u
2
) = 0.98 (right). The estimators are the weighted
maximum likelihood (top), the intervals (middle), and the iterative least squares (bottom). Heavy solid
lines: estimates, thin dashed: 95% percentile bootstrap CI, thin solid: true extremal index.
estimator, and the bootstrap method proposed by Ferro and Segers (2003) and described in
Section 2.1.5 was applied to obtain condence intervals.
The results on one of the simulated examples are shown in Figure 3.5; the two processes
composing the estimated sequence were the same as our ARMA(1,4) example in Section 3.1.5
( = 0.25) as the rst half of the sequence, and another ARMA(1,4) with coecients =
0.38,
1
= 0.74,
2
= 0.72,
3
= 0.05,
4
= 0.01 ( = 0.13) for the second. The maximum
likelihood estimate looks more stable for both thresholds than the other two, and the step
function-like character is more clear from the estimates.
70
3.2. DATA ANALYSES
S
c
a
l
e
d

t
e
m
p
e
r
a
t
u
r
e
1
0
1
2
1800 1850 1900 1950 2000
Figure 3.6: The deseasonalized Central England winter temperatures (black spikes) with the cubic
spline-smoothed ten-year median of the deseasonalized temperatures (thick white line). The thin white
line marks zero.
3.2 Data analyses
3.2.1 The Central England temperature series
As a simple example of the use of local linear smoothing methods for estimation of extreme
clusters, we assess possible changes in the extremal index of unusually cold winter temperatures
in the Central England Temperature data between 1 January 1772 and 31 December 2004.
The data set can be downloaded from the website of the British Atmospheric Data Center
(http://badc.nerc.ac.uk/). We will now focus more on the smoothing technique than on
possible problems with the threshold and run parameter selection. We use a threshold chosen
by a rough serial GPD modelling of the complete deseasonalized-detrended series, corresponding
to the 0.98-quantile, and K = 1, found to be acceptable by checking condition D
(1)
(u
n
).
The Central England temperatures are a series of daily mean temperatures, representative
of a roughly triangular area of the United Kingdom enclosed by Preston, London and Bristol.
The data were created using one or more measurement series from several sites in this area from
overlapping periods, and homogeneized considering all available metadata (Parker et al., 1992).
It stretches over 233 years and provides excellent test data for the application of kernel methods.
Since the series has both a seasonal component and a trend, a two-step procedure was
applied to obtain a series having roughly the same range over the whole period, enabling us to
determine common thresholds. Yearly seasonal variation in the median and in the dispersion
71
was approximated by a cubic spline-smoothed median and median absolute deviation taken for
each day of the year over the 233 years of the series, then used to center and scale the data.
For a random sequence X
i
with means m
i
and variances s
2
i
, standardization is achieved by the
transformation
X
i
=
X
i
m
i
s
i
, (3.7)
which gives the resulting variables X
i
zero mean and unit variance. In the absence of information
about the precise mean and variance of the process, m
i
and s
i
were replaced by estimates. For
these studies we used the median and median absolute deviance as robust estimators of these
two quantities.
The result of the deseasonalizing is shown in Figure 3.6. The longer-term trend, which
seems to become non-negligible in the last decades, was removed by centering and scaling the
deseasonalized series with cubic spline-smoothed 10-year medians and median absolute devia-
tions. At the ends of the series, we repeated the rst ve years before 1 January 1772 and the
last ve after 31 December 2004. To apply standard methods for maxima, the series was then
negated, and the sequence of 1-gaps calculated by selecting the exceedances above a threshold
corresponding to the 0.975-quantile; nally, we dropped the gaps ending in months from March
through November. This method results in keeping gaps beginning before December 1, and
sometimes stretching over a few seasons; as we later showed, concatenating the winters before
the threshold selection and the calculation of gaps doesnt imply bias in the estimates and could
have been applied here too.
To check condition D
(1)
(u
n
), we divided the series into four periods, 17721830, 1830
1889, 18891948 and 19482004, and we computed the surface p(u, r) with the values F(u) =
0.95...0.995 and r = 4, 5, 6, 7, 12, 20 for each period. The result is shown in Figure 3.7. Although
the proportion of anti-D
(1)
(u
n
) events is 0.1, it remains approximately constant for all the
plane (u, r), so, for the sake of estimation, we supposed that the resulting bias will remain
approximately constant and relatively small, thus allowing us to detect any nonstationarity.
Corresponding to a description by a xed model with parameters varying with time, we
suppose that there exists a limiting Poisson process of the exceedances for the Central England
temperature data, but that it is inhomogeneous, that is, the parameter is changing over time: at
any point t the limiting distribution of the inter-exceedance spacings is P
_
F(u
n
; t)S(u
n
) > s
_
(t) exp{(t)s} for s > 0 as n , where (t) is the extremal index, a smooth function of the
72
3.2. DATA ANALYSES

0.96
0.98

4
5
6
7

0.0
0.2
0.4

0.96
0.98

4
5
6
7

0.0
0.2
0.4

0.96
0.98

4
5
6
7

0.0
0.2
0.4

0.96
0.98

4
5
6
7

0.0
0.2
0.4
Figure 3.7: Condition D
(1)
(u) for the Central England temperatures, separately for the four periods
17721830 (top left), 18301889 (top right), 18891948 (bottom left), 19482004 (bottom right).
time. However, in this case of nonstationarity the existence of the extremal index is not assured
by any theoretical result.
To compare the maximum likelihood, the intervals, and the iterative least squares estima-
tors, we estimated the extremal index of the series with two dierent smoothing time intervals
h = 3001 and 7301 (approximately 33 and 80-year periods) with threshold 2.0. To model the
temporal behaviour of the extremal index we applied a logit-transformed local linear form for
as a function of time. We used direct optimization of the locally weighted log-likelihood
(W)
(t
i
0
) =
K
i=1
w(t
i
){(t
i
)}, (3.8)
where K is the number of clusters in the interval centered on the cluster at time t
i
0
, t
i
are the
locations of clusters, {(t
i
)} is the likelihood contribution from the cluster beginning at t
i
, and
(t
i
) =
exp{
t
i
0
+
t
i
0
(t
i
t
i
0
)}
1 + exp{
t
i
0
+
t
i
0
(t
i
t
i
0
)}
, (3.9)
with
t
i
0
and
t
i
0
as the parameters of the model in the smoothing interval centered at t
i
0
. The
weights w(t
i
) were given by a truncated normal kernel. For this setup, a parametric bootstrap
based on the t {
t
i
0
,

t
i
0
} with 500 repetitions was used to assess estimation variability. For
each interval, we simulated the inhomogeneous Poisson process with intensity determined by
73
0
.
0
0
.
4
0
.
8
1775 1825 1875 1925 1975
Local linear MLE
0
.
0
0
.
4
0
.
8
1775 1825 1875 1925 1975
Intervals
Figure 3.8: Comparison of the weighted local linear MLE and intervals estimates of the extremal index
of the CET, with threshold u
(2)
= 2.0 and 7301-day smoothing intervals. The local linear maximum
likelihood estimate is shown on the left panel, the intervals estimate on the right (heavy solid lines on
both panels), both with the 0.95 percentile bootstrap CI (thin solid).
(t
i
), repeated the estimation for the samples, and took the 0.025- and 0.975-quantiles. The
intervals and the iterative weighted least squares procedure were applied in the same way as in
Section 3.1.6.
Figure 3.8 shows the results obtained by local linear smoothing and intervals estimation for
the 7301-day smoothing interval and threshold 2.0. The intervals and the iterative least squares
estimates were fairly similar. The bias caused by the possible failure of D
(1)
(u
n
) appears to be
small. After about 1850, the two estimates are much the same, since their condence bands are
strongly overlapping. Before 1850, there is a dierence between the maximum likelihood and the
intervals estimate: the maximum likelihood method shows a pronounced peak of 0.6 where
the intervals (and the iterative least squares) give nearly constant values around = 0.4. The
dierence might be due to the use of equal weights; using non-equal weights tends to accentuate
more the local characteristics of the time series around the center of the estimation window.
With a long window, a local feature in the series is masked by the equally weighted farther
periods in the intervals and the least squares estimation; in investigations with shorter window,
these estimators too showed a peak here, though with a lower maximum. Another plausible
explanation would be D
(1)
(u
n
) failing temporarily, but this was not conrmed by the analysis
of the anti-D
(1)
(u
n
) events. For this example, we didnt examine the misspecication tests; the
next example, the Neuchatel temperatures, will suggest that very local changes in the processes
can entail large dierences in estimates.
For the period 17721980 both the intervals and the iterative least squares methods suggested
74
3.2. DATA ANALYSES
E
x
t
r
e
m
a
l

i
n
d
e
x
1800 1850 1900 1950 2000
0
0
.
5
1
Figure 3.9: Check of the hypothesis of constant extremal index with the threshold u
(2)
= 2.0. Heavy
solid line: estimated extremal index for the true sequence, thin solid lines: median, 0.025- and 0.975-
percentile bootstrap condence interval, thin dashed lines: 95% overall condence interval, thin straight
line at = 0.5: plausible constant value for the extremal index.
the possibility of a constant extremal index with a value slightly below 0.5. For the local linear
maximum likelihood estimation, this was checked by a bootstrap procedure: for threshold 2.0,
we resampled the winter inter-cluster 1-gaps and cluster sizes with equal probabilities, and
performed weighted local linear maximum likelihood estimation on 4000 repetitions, resulting in
a collection of extremal index curves. These were then smoothed and their values on a common
time grid were calculated to obtain pointwise and overall percentile condence bands for the
time-dependent curves. We show the result in Figure 3.9. Although there is some suggestion
of a weak downward tendency, an overall extremal index value of 0.5 seems plausible for most
of the observed period, even for the period 18001830. In the last few years, the estimated
extremal index has a strong upward tendency that is shown by all three estimation methods and
with all window lengths tried. This is not signicant in Figure 3.9, but incites a reinvestigation
of the process after several future years.
3.2.2 The Neuchatel daily minimum temperatures
Our next test data set X
t
is a series of daily minimum temperatures from the Swiss city of
Neuchatel covering the period 1 January 1901 to 31 May 2006. The site, Neuchatel, is a low-
elevation lake-shore urban site with a stable population of 30,000 since the beginning of the
measurements and abundant vegetation and extended forests in the surroundings. The station
has never been relocated since 1901; the transition to automatic measurement happened on
January 1st 1978. In order to avoid nding false nonstationary eects, the homogeneity of
the sequence was checked by the application of the changepoint detection software RHtestV2
75
S
c
a
l
e
d

t
e
m
p
e
r
a
t
u
r
e
1
0
1
2
3
1910 1930 1950 1970 1990
Figure 3.10: The deseasonalized Neuchatel daily minimum temperatures (black spikes) with their trend,
estimated by a cubic spline-smoothed 10-year moving median (heavy white line). The thin white line at
0 provides visual help to appreciate the trend.
(http://cccma.seos.uvic.ca/ETCCDI). It showed a signicant jump of 0.8 allocated to August
1987; MeteoSwiss metadata do not show any event around this time. This change-point at the
end of the eighties coincides with a rapid increase also visible in the 12 mean temperature
sequences from Switzerland homogenised and analysed by Begert et al. (2005), as well as in the
131 temperature series of the greater Alpine region (Auer et al., 2007). We thus acknowledge
homogeneity of the Tmin series, accepting the reality of this sharp change.
We chose to investigate the clustering behaviour of the minimal summer minimum temper-
atures. We kept only data from the summer months, and treated the resulting sequence of
summers as a continuous time series. All results described in the sequel refer to this series.
Deseasonalizing and de-trending were performed the same way as for the Central England tem-
peratures. The sequence showed not only warming trend in its median, as is clear in Figure
3.10, but once seasonal variation and slow trend were both removed from its mean and variance,
a GEV analysis of the negated monthly minima of the rescaled-centered series using time and
relative air humidity as covariates showed signicantly growing scale parameter from approx-
imately 1970, as summarized in Appendix A (S uveges et al., 2008). Using monthly minima
corresponds to unusually small blocks for a GEV approximation, but diagnostics showed a good
t. This suggests that the distribution of the monthly minimum temperature anomalies has re-
cently been becoming wider. Is this accompanied by a change in the tendency to form clusters,
or in a broader sense, a change in the serial dependence structure on extreme levels? What do
threshold methods say about the behaviour of the minimum temperatures at extreme levels?
Does an analysis of the complete sequence show any more warning signs against the assumption
of stationarity for the anomalies?
76
3.2. DATA ANALYSES
1.5 2.0 2.5
0
.
1
0
.
1
0
.
3
Mean excess
Threshold
x
0.92 0.94 0.96 0.98
0
.
6
0
.
2
0
.
2
xi(u)
F(u)
0.92 0.94 0.96 0.98
0
.
2
0
.
6
1
.
0
psi(u) u*xi(u)
F(u)
Figure 3.11: Classical threshold selection plots for the complete Neuchatel minimum temperature series.
The 0.95- and 0.98-quantiles are indicated by dotted vertical lines on the left panel; the dashed horizontal
lines help to assess the stability of the parameter estimates of the serial GPD t.
Assuming in the rst instance stationarity for the 19012006 deseasonalized-detrended se-
quence, we performed the classical threshold selection procedures, the check for the stability of
the GPD parameter estimates and the mean excess plot. The scale parameter showed a break-
point around the 0.98-quantile; the other two diagnostics, namely, the shape parameter of the
GPD t and the mean excess plot (Figure 3.11) for the complete sequence from 1901 to 2006
too hinted at a breakpoint, though they did not reject the possibility of acceptable ts from
fairly low thresholds. The t using the declustered exceedances above the 0.99-quantile gives
an estimate

= 0.09(0.2) and = 0.23(0.07) for the shape and the scale parameters. The
diagnostic plots are presented in the top row of Figure 3.12.
The bottom row of Figure 3.12 shows a GPD t using the 0.95-quantile as threshold. We
might be interested not only in the behaviour of the bulk of the data which is generally ap-
proximated by a normal distribution, or in the very high extremes of the data described by
asymptotically valid extreme-value distribution, but in the characteristics of somewhat lower
quantiles too. Heatwaves can be an example: they may be characterized as long series of tem-
peratures above a moderately high threshold such as the 0.95-quantile. These temperatures
dominate prolonged periods of heatwaves, having hard impact on human health, and catas-
trophic eect on agriculture if associated to drought. Therefore, it might be interesting to nd
statistical models for them. Extreme-value distributions are a reasonable guess, as long as the
diagnostic methods show them acceptable, and as long as we bear in mind that their validity is
restricted. For the Neuchatel minimum temperature series, the diagnostics in the bottom row
reveal an acceptable t, which yields a fairly good distributional model for observations higher
77
2.5 3.0 3.5 4.0
0
.
0
0
.
4
0
.
8
x (on log scale)
F
u
(
x
u
)
0 10 20 30 40 50 60 70
0
1
2
3
4
5
Ordering
R
e
s
i
d
u
a
l
s
0 1 2 3 4 5
0
1
2
3
4
5
Ordered Data
E
x
p
o
n
e
n
t
i
a
l

Q
u
a
n
t
i
l
e
s
1.5 2.0 2.5 3.0 4.0
0
.
0
0
.
4
0
.
8
x (on log scale)
F
u
(
x
u
)
0 100 200 300 400 500
0
2
4
6
8
Ordering
R
e
s
i
d
u
a
l
s
0 2 4 6 8
0
1
2
3
4
5
6
7
Ordered Data
E
x
p
o
n
e
n
t
i
a
l

Q
u
a
n
t
i
l
e
s
Figure 3.12: Classical diagnostic plots for the stationary GPD ts for the complete Neuchatel Tmin
series. The top row refers for a threshold such that F(u) = 0.99, the bottom to F(u) = 0.95. Left column:
the tted generalized Pareto distribution (solid line) and the exceedances (circles). Middle: the residuals
versus time. Right: quantile-quantile plots of the residuals against the theoretical quantiles.
than the 0.95-quantile, with estimates

= 0.21(0.06) and = 0.4(0.04); but we must not
forget that it has no asymptotic validity as thresholds go higher. For extrapolating to the far
tails, we must model the data separately and use the t above the 0.98-quantile.
Repeating the diagnostics in three 41-year-long periods centered on the years 1925, 1955
and 1985, the subperiods revealed possibly dierent behaviour, although all of them had a
similar breakpoint at nearly the same threshold value (Figure 3.13). The earliest subperiod
presents increasing shape and decreasing scale parameter, giving a strong impression of nonstable
behaviour with respect to thresholds, though the condence intervals allow us to imagine a
stable estimate at very high thresholds. The second, in the middle of the century, seems to show
opposite trends in its scale and shape parameter as compared to the previous 41 years, but with
larger condence intervals. The last one, the end of the century, seems to be the stablest of
all: the breakpoint nearly disappears, and the condence bands allow consistent modelling from
relatively low thresholds.
78
3.2. DATA ANALYSES
1925
G
P
D
x
i
.
1
9
2
5
[
1
:
7
4
]
0
.
6
0
.
0
0.92 0.94 0.96 0.98
1955
xi(u)
G
P
D
x
i
.
1
9
5
5
[
1
:
7
4
]
0
.
6
0
.
0
0.92 0.94 0.96 0.98
1985
G
P
D
x
i
.
1
9
8
5
[
1
:
7
4
]
0
.
6
0
.
0
0.92 0.94 0.96 0.98
0.92 0.94 0.96 0.98
0
.
5
1
.
0
F(u)

0.92 0.94 0.96 0.98
0
.
5
1
.
0
psi(u) u*xi(u)
F(u)

0.92 0.94 0.96 0.98
0
.
5
1
.
0
F(u)

1.4 1.8 2.2 2.6
0
.
0
0
.
2
0
.
4
Threshold
x
1.4 1.8 2.2 2.6
0
.
0
0
.
2
0
.
4
Mean excess
Threshold
x
1.4 1.8 2.2 2.6
0
.
2
0
.
2
0
.
6
Threshold
x
Figure 3.13: Classical threshold selection plots for three 41-year windows centered on 1925, 1955 and
1985 for the Neuchatel minimum temperature series. The left column refers to 1925, the middle, to 1955,
the right, to 1985. The top row shows the shape parameters of the serial GPD ts, the middle row their
scale parameters, the bottom row the mean excess plots. The dotted vertical lines on the bottom panels
indicate the position of the 0.95- and 0.98-quantiles; the dashed horizontal lines on the two top rows help
to assess the stability of the parameter estimates of the serial GPD t.
Suggestions of nonstable behaviour on time-scales of a few decades come therefore from three
sources of information: there is a clear nonstationarity in the mean of the sequence (Figure 3.10);
the GEV analysis of monthly minima with covariates found a signicant time dependence of the
scale parameter; and the serial GPD ts in three subperiods hinted also at possibly uctuating
behaviour. Classical methods did not bring up arguments against assuming the series stationary,
and a GPD t acceptable by all classical diagnostic methods could be obtained using a threshold
slightly above the 0.98-quantile. Nevertheless, these signs of possible nonstationarity suggest
some care when trying to estimate clustering characteristics over time. Marginal changes over
time in the extremes can lead to the discovery of false time eects in the clusters, simply due
79

0.96
0.98

5
10

1.0
1.1
I/J

0.96
0.98

5
10

0
5
10
T
Figure 3.14: Misspecication for the complete Neuchatel daily maximum temperatures X
t
. Horizontal
foreground axis: threshold, horizontal left axis: run parameter K, vertical axis: the value of the test
statistics. Left panel:

J(
(K)
)/
I(
(K)
), middle: T(
(K)
). The blue lines correspond to 1 on the left plot
and to the critical 0.98-quantile of the
2
1
distribution on the middle and the right. The red surface on
the left plot is the standard error of

J(
(K)
)/
I(
(K)
).
to varying frequency of exceedances. This eect can be avoided either by using a varying
threshold, or by marginally standardizing the series with local semi-empirical transformation
to unit Frechet margins. With a window of length 41 years, a truncated normal kernel and
a threshold corresponding to the local 0.98-quantile, a smooth weighted generalized Pareto
model was tted to the negated, deseasonalized-detrended data, to obtain the time-dependent
parameters

(t) and (t) at time t. The semi-empirical transformation to unit Frechet margins
(Coles and Tawn, 1994) was applied using the local shape and scale parameters

(t) and (t),
which resulted in a sequence X
t
hopefully closer to marginal stationarity. A quantile-quantile
plot conrmed an acceptable approximately unit Frechet margin for the whole sequence.
The misspecication tests oer a new possibility to check for the appropriateness of a model
dened by a given threshold, so as the rst step we calculated the sequence of K-gaps, S
(K)
i
(u),
for K = 1, . . . , 12 and for thresholds corresponding to F(u) = 0.95, 0.96, 0.97, 0.98 and 0.99.
Then we calculated the two misspecication test statistics based on the complete gap sequence
S
(K)
i
(u): the ratio

J(
(K)
)/
I(
(K)
) and T(
(K)
), for every combination of thresholds u and
run parameter K. The result is shown in Figure 3.14. The test statistics have similar overall
80
3.2. DATA ANALYSES
behaviour, showing a higher track along the threshold F(u) = 0.98. The surfaces warn us to
avoid the threshold corresponding to F(u) = 0.98, since here it exceeds the limit dened by
the empirical standard error for the ratio

J(
(K)
)/
I(
(K)
) and the level
2
1
(0.95) in the case of
T(
(K)
). This is exactly the threshold where the classical methods, the mean excess plot and
the serial GPD ts showed the breakpoint. The values of the test statistics are everywhere quite
close to the misspecication boundary, meaning that the model, though acceptable, is nowhere
good, and there are no (u, K) pairs which can improve it.
The somewhat surprising conclusion of the information matrix tests is that the exponential-
point mass mixture model is misspecied using the threshold F(u) = 0.98. Though this is
the location of the breakpoint shown by the classical selection procedures, the GPD parameter
estimates are stable from F(u) = 0.98, and the classical guidelines advise us to use the lowest
possible threshold, that is, around F(u) = 0.98. Thus, the point process model, on which the
exponential-point mass mixture model is based, is invalid at the threshold which seems the
best for GPD modelling. More information might be gained from a misspecication test based
directly on the GPD or the GPD-Poisson likelihood, but this is not implemented here. Also,
the ratio

J(
(K)
)/
I(
(K)
) can be considered as an additional warning about blind application
of the stationary models. Its value is uniformly high for every (u, K), and hints at a source of
misspecication other than the choice of (u, K). This adds nonstationarity to the list of possible
sources of misspecication.
Since the issue was possible nonstationarity, we checked model misspecication as a func-
tion of time: we put window centers successively at 15 July of each year, and calculated the
weighted version of the ratio

J(
(K)
)/
I(
(K)
) with

J(
(K)
) =
N1
i=1
w
i
(, S
j
)
2
and

I(
(K)
) =
N1
i=1
w
i
(, S
j
), and the test T(
(K)
) used in the rst step, for every combination of thresh-
olds u and run parameter K. We dened the time of any gap as the time of its endpoint.
The information matrix test T(
(K)
) in its present form does not allow for the use of weights;
these were calculated for each of the same 106 windows in their non-weighted form dened by
equations (3.3). The weights w
i
and the window length were the same as in the estimation of
the generalized Pareto model. The calculations thus gave 106 sets of

J(
(K)
)/
I(
(K)
) ratios,
empirical variances and T(
(K)
) values and extremal index estimates for all (u, K) pairs, for the
transformed sequence X
t
.
Corresponding to the three main sources of misspecication in an extreme-value analysis
setup, the threshold selection, the choice of the run parameter and the nonstationarity, the test
81

0.96
0.98

5
10

0
5
10
1906

0.96
0.98

5
10

0
5
10
1915

0.96
0.98

5
10

0
5
10
1924

0.96
0.98

5
10

0
5
10
1933

0.96
0.98

5
10

0
5
10
1942

0.96
0.98

5
10

0
5
10
1951

0.96
0.98

5
10

0
5
10
1960

0.96
0.98

5
10

0
5
10
1969

0.96
0.98

5
10

0
5
10
1978

0.96
0.98

5
10

0
5
10
1987

0.96
0.98

5
10

0
5
10
1996

0.96
0.98

5
10

0
5
10
2005
Figure 3.15: Misspecication as a function of time for the Neuchatel daily maximum temperatures.
Horizontal foreground axis: threshold, horizontal left axis: run parameter K, vertical axis: T(
(K)
). The
blue lines correspond to the critical 0.98-quantile of the
2
1
distribution. Years are indicated above the
plots.
82
3.2. DATA ANALYSES
statistics proposed above must depend on at least three variables: threshold, run parameter and
time. The task of surveying such an amount of gures seems to be overwhelming. To gain a
rst impression about the time dependence of the test statistics, we plotted the surfaces T(
(K)
)
and

J(
(K)
)/
I(
(K)
), the latter together with its empirical variance, as functions of K and u,
and made movies about the changes of the surfaces in time. Twelve instants of the movie about
the statistic T(
(K)
) are presented in Figure 3.15. This very suggestive visual method brought
a few conclusions:
In the sequence, only the combination F(u) = 0.99, K 2 behaved well in all three test
statistics over the whole 106 years of data. The best region seemed to be F(u) = 0.99, K =
4 or 5, with all test statistics closest to their ideal value over the whole century. In several
subperiods, other models can also be used. The movies suggest a complex behaviour of the
process, uctuating in time; estimates concerning the middle of the century on thresholds
with F(u) < 0.99 are based on misspecied models.
The period where every test statistic agreed in a bad misspecication for any combination
other than F(u) = 0.99, K 1 was the middle of the 20th century; this can be appreciated
especially in the last panel of the second row in Figure 3.15 and in the panels showing the
ratio

J(
(K)
)/
I(
(K)
) in 1955 for F(u) = 0.95 and F(u) = 0.97 in Figure 3.16. Smaller
instabilities were also found in the last decades, though these rarely exceeded the critical
2
1
-quantile. These two periods roughly coincide with the strongest nonstationarity in the
mean behaviour of the temperatures (see Figure 3.10: the most marked periods of change
in the 10-year median are in the 1940s and in the 1980s.)
More detailed checks provided conrmation of the impressions from the movies. Figure 3.16
shows the dependence on the run parameter of the two statistics

J(
(K)
)/
I(
(K)
) and T(
(K)
) for
the years 1925, 1955 and 1985, and for threshold choices such that F(u) = 0.95, 0.97 and 0.99.
The plots suggest strongly the use of the threshold F(u) = 0.99 and K = 4; results obtained
using lower thresholds can be questionable, though the behaviour of the extremal index on these
levels may be investigated with K = 1 in the beginning and the end of the century. The right
panel of Figure 3.17 shows the result of the weighted smooth maximum likelihood estimation for
all thresholds with this run parameter. The plot indicates periods of misspecication by showing
the lines in grey. The left panel presents the estimate with the best combination F(u) = 0.99
and K = 4; this is valid at every time point.
83
1925

I
/
J
0
.
8
1
.
1
1
.
4
1955
F(u) = 0.95
I
/
J
0
.
8
1
.
1
1
.
4
1985
I
/
J
0
.
8
1
.
1
1
.
4
T
0
5
T
0
5
T
0
5
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
I
/
J
0
.
8
1
.
1
1
.
4
F(u) = 0.97
I
/
J
0
.
8
1
.
1
1
.
4
I
/
J
0
.
8
1
.
1
1
.
4
T
0
5
T
0
5
T
0
5
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
I
/
J
0
.
8
1
.
1
1
.
4
F(u) = 0.99
I
/
J
0
.
8
1
.
1
1
.
4
I
/
J
0
.
8
1
.
1
1
.
4
T
0
5
T
0
5
T
0
5
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
2 4 6 8 10 12
0
.
2
0
.
8
K
E
s
t
i
m
a
t
e
Figure 3.16: The diagnostic plots for 41-year periods with center in the years 1925 (left column), 1955
(middle) and 1985 (right). The top triplet of rows shows the weighted ratio

J(
(K)
)/
I(
(K)
) (uppermost),
the test statistics T(
(K)
) (middle), and the corresponding estimate of the extremal index (lowermost)
versus the run parameter K with threshold F(u) = 0.95; the middle triplet is the same for F(u) = 0.97,
and the bottom triplet for F(u) = 0.99.
84
3.2. DATA ANALYSES
1900 1920 1940 1960 1980 2000
0
.
4
0
.
6
0
.
8
1
.
0
Confidence intervals
E
s
t
i
m
a
t
e
d

t
h
e
t
a
1900 1920 1940 1960 1980 2000
0
.
4
0
.
6
0
.
8
1
.
0
Threshold dependence
Figure 3.17: The maximum likelihood estimate using the combination F(u) = 0.99, K = 4 (left panel,
thick solid line), together with condence intervals based on nonparametric bootstrap (thin solid lines)
and on asymptotic normality (thin dashed lines). The threshold dependence of the estimates for u such
that F(u) = 0.95 (solid line), F(u) = 0.96 (long-dashed), F(u) = 0.97 (dashed), F(u) = 0.98 (dash-
dotted) and F(u) = 0.99 (dotted), all with K = 1, is shown in the right panel. Grey indicates the periods
where

J(
(K)
)/
I(
(K)
) or T(
(K)
) detected misspecication.
There are a number of ways to assess uncertainty. The simplest is based on the asymptotic
normality, but it can perform poorly, especially if the number of observed K-gaps in the in-
vestigated period is small, or if the estimated extremal index is close to the parameter space
boundaries, 0 or 1. Another possibility is the block bootstrap: the simplest way, assuming sta-
tionarity within our 41-year period of interest, is to resample the years with equal probability,
repeat the estimation on each repetition, and take percentiles. A slightly dierent approach
is to identify clusters with runs declustering using K, and resample clusters and inter-cluster
distances conditioned on the length of the series. The parametric bootstrap, based on the es-
timated extremal index to form repetitions of the point process of exceedances, can be used
only with run parameter K = 1; for K > 1, to reconstruct the process in a fully parametric
way, we need to know the distribution of the within-cluster gaps, about which the model pro-
vides no information. Figure 3.17 compares the nonparametric block bootstrap and asymptotic
normality for the calculation of condence intervals for the best threshold-run parameter com-
bination. The two dier slightly, the bootstrap interval being larger. The dierence is largest in
the middle period, which shows signs of quick change. Since this may invalidate our assumption
of near-stationarity for the 41 years of the windows, the block bootstrap condence interval also
may give misleading results about uncertainty.
Finally, comparisons to the intervals and the iterative least squares estimators are also pre-
85
1900 1920 1940 1960 1980 2000
0
.
3
0
.
5
0
.
7
0
.
9
E
s
t
i
m
a
t
e
d

t
h
e
t
a
1900 1920 1940 1960 1980 2000
0
.
3
0
.
5
0
.
7
0
.
9
Figure 3.18: Comparison of the maximum likelihood with K = 4 and F(u) = 0.99 (heavy solid in both
panels), the intervals (left panel, thick dashed) and the iterative least squares (right panel, thick dotted)
estimator. The 95% condence intervals of the estimators are plotted with the thin versions of the lines.
sented in Figure 3.18. For both, the same window of 41 years centered at the year in question was
slid over the 106 years, and as in the case of maximum likelihood, we assumed near-stationarity
within the window. This may induce stronger bias in the estimates than for the maximum likeli-
hood estimate, since there is no way to downweight observations far away from the centre of the
window and thus to weaken the eect of nonstationarity. It can be seen that not only the max-
imum likelihood estimates, but also the intervals and iterative least squares estimates, suggest
varying extremal index, with a drop in the mid-century and near the end of the 20th century.
Repeating intervals and IWLS estimation at all ve thresholds, we again nd the threshold-
dependence pattern of the extremal index estimates, similar to the right panel of Figure 3.17.
At intermediate thresholds, for which the ratio

J(
(K)
)/
I(
(K)
) indicated misspecication in the
middle of the century, these two estimators too found decreasing extremal index over the cen-
tury, though with slightly lower values everywhere. Parallel investigations, not presented here,
showed a weaker, but similar trend for the warmest daily minimum temperature anomalies, but
no change in the extremal index of either the hottest or the coolest daily maximum temperature
anomalies.
The study conrmed the existence of a strong threshold dependence in the parameters of
its extreme-value distribution. This may be linked to the existence of a breakpoint shown by
classical threshold selection methods around thresholds corresponding to the 0.98-quantile; it
causes an abrupt jump in the value of the parameters of the GPD distribution and a less abrupt
change in the extremal index. The reason for this change is unknown; a simple explanation
could for example be that the observations above the 0.95-quantile come from a mixture parent
86
3.3. SUMMARY
distribution, the components of which have dierent extreme-value limits, and stabilization
above a threshold happens when one component becomes dominant in the exceedances above
this level.
A dierent reason might be the long memory character of the process of daily minimum
temperatures. In this case, standard extreme-value theory could not be applied, since the
fundamental asymptotic independence conditions are not satised, and we cannot expect the
standard limits to be valid. More studies are necessary to investigate this possibility. If long
memory proved to be a common occurrence in climatic time series, then assessing and forecast-
ing global changes on extreme levels would become a much more dicult research area, with
practically no straightforwardly applicable methodology available at present.
The study suggests the existence of strong shorter-period uctuations in the process. The
moving window misspecication tests revealed strongly varying character of subperiods, and
showed that between roughly 1925 and 1965, on commonly used thresholds, the point process of
extreme clusters cannot be well described by a marked Poisson process. The use of a very high
threshold, corresponding to the 0.99-quantile, is acceptable. On this threshold, estimates are
dierent in subperiods of the century, which was found to be acceptably stationary as a whole
by standard methods. Common practice in the assessment of the expected eects of the climate
change on extreme levels is to compare periods of a few decades; the oscillations found in the
characteristics of the exceedances on such time-scales warn us to make such comparisons with
caution.
3.3 Summary
The chapter presents a likelihood-based inference methods for the clustering characteristics of
the extreme events of a random process. The maximum likelihood estimator based on the K-gaps
exploits the compound Poisson character of the times of the extreme clusters, more precisely, the
limiting exponential-point mass mixture distribution of the inter-exceedance times. To apply it
in practice, we need to truncate the inter-exceedance times, which corresponds to the choice of
a run parameter, or equivalently, to an explicit assumption about the dependence range of the
process.
This inference thus requires a careful investigation of the possible misspecication in the
model. The main motivation comes from the necessity of the run parameter selection. A
87
good choice of run parameter entails independence of the truncated nonzero inter-exceedance
gaps, and we can use an independent likelihood. In an extreme-value model, there is always an
additional source of possible misspecication, namely, the threshold choice. To these two, a third
factor is added in most practical applications of extreme statistics: the issue of nonstationarity.
To recognize these factors, all entailing model misspecication, we adapted methods of detecting
model misspecication from econometrics (White, 1982). These methods are based on test
statistics tracing departures from the relation J() = I() that is satised in well-specied
regular models. Although we applied it here only for nding the best parameter combinations
to reect the compound Poisson character of the extreme clusters in a process, these methods are
general. Straightforward modications may yield similar tests in GEV or GPD models, in GEV
models for the choice of block size, and in GPD models for the joint selection of the threshold
and the declustering parameter.
There are a few advantages of the likelihood method in the estimation of the extremal index
in univariate time series:
Diagnostic methods are given to help choose the run parameter K; the tests based on
asymptotic distributions of various combinations of the elements of the information sand-
wich (White, 1982, 1994) or calculating the ratio J(
(K)
)/I(
(K)
) and plotting them as
functions of (u, K) may yield appropriate threshold and run parameter choices. Unfortu-
nately, this restricts its use as an element in more complex methodology: careful selection
of threshold and run parameter is not feasible in every situation. But once an overall run
parameter and threshold are chosen, it is a exible, simple element to build into possibly
more sophisticated procedures such as smoothing, GAM or mixed-eects models.
The maximum likelihood estimator is consistent and asymptotically normal under the right
choice of K.
Extension to the D-variate case is natural, though has its own problems: the extremal
index function of a D-dimensional data set as dened in Nandagopalan (1994) can be
estimated pointwise on the D-dimensional unit simplex (Smith and Weissman, 1996),
and any univariate method can be applied; we will show an example of such a use in
Section 5.1. Of course, such applications raise diculties about checking the run parameter
specication on a grid. The situation is similar to temporal dependence, where we had
to select the adequate run parameter and threshold separately for every window. The
88
3.3. SUMMARY
dierence in the multivariate case is that misspecication should be in any case checked
at least in a few regions of the unit simplex, and nonstationarity is a possible additional
diculty.
89
90
Chapter 4
M3 modelling
We present the second new contribution of the thesis, the idea of estimating the extremal clusters
of a process based on the M4 approximation of Smith and Weissman (1996). We deal rst with
the simpler univariate case. This method nds typical patterns around extremes of a process,
and may therefore provide more detailed information about the clusters than just estimating the
extremal index, which is a summary measure. Section 4.1.1 gives an exposition of the theoretical
background of the procedures, and summarizes the preliminary cluster identication, a necessary
rst step to generate the variables for which the model is constructed. Section 4.1.3 presents
the basic assumption of the model, the Dirichlet mixture assumption, the proposed procedure
is summarized in Section 4.2.1, and its performance is shown on simulation examples in Section
4.2.2. The method to estimate the mixture, the EM algorithm, is described briey in Appendix
B.
4.1 M3 approximation
4.1.1 The model
The limiting theorem of Smith and Weissman (1996), stating that the extremes of a large class of
multivariate stationary dependent sequences can be approximated by a multivariate maxima of
moving maxima (M4) process, oers an alternative way of estimating the extremal characteristics
and extremal index in the univariate case as well. Compared to the many existing well-developed
methods, the M4 approach is harder to implement and is more computer-intensive, but it yields
a detailed picture about the characteristic trajectory of the process in the neighborhood of the
extremes.
91
CHAPTER 4. M3 MODELLING
In the univariate case, we can take D = 1 in the denition (2.24) of Section 2.2.3. This
process is called an M3 process, dened as
Y
t
= max
k
max
l
a
lk
Z
l,tk
, t Z,
where {Z
li
, l N, i Z} are sequences of independent unit Frechet variables, and the coe-
cients {a
lk
, l N, k Z} are nonnegative constants satisfying
l=1
k=
a
lk
= 1.
For the sake of estimation, we must take l = 1, . . . , L and k = K
1
, . . . , K
2
with nite L, K
1
and
K
2
; in the latter cases, we can x K
1
= 1 and K
2
= K without further loss of generality. Recall
from Section 2.2 that the extremes of Y
t
are generated by the extreme values of the underlying
unit Frechet shock variables. As a consequence, in the extreme-value limit the neighbourhoods
of extremes {Y
t+1
, Y
t+2
, . . . , Y
t+K
} will have the form {a
l1
Z
lt
, a
l2
Z
lt
, . . . , a
lK
Z
lt
}, l denoting the
shock sequence in which the generating extreme value occurred at time t. The probability of
the lth cluster type is p
l
=
k=
a
lk
.
In the observed M3 process Y
i
, if Y
t+1
, Y
t+2
, . . . , Y
t+K
are the elements of an extreme cluster,
its signature can be characterized by the relation
Y
t+k
j=
Y
t+j
=
a
lk
j=
a
lj
, k = 1, . . . , K,
for a specic l. There are as many signatures as independent Frechet shock sequences. The
right-hand side of this equation is constant and will be denoted by c
lk
; obviously
k=
c
lk
= 1
for all l.
To nd the limiting M3 process for an observed sequence Y
i
, we suppose that the extreme
clusters of Y
i
can be approximated by the extreme clusters of the limiting M3 process. We
expect that as we investigate observations above increasing thresholds, we nd clusters around
extremes that become more similar to those of the limiting M3 process, with their signatures
and their frequency in the series. Our purpose is to estimate the signature parameters c
lk
and
the cluster probability p
l
, to obtain an estimate of the extremal index and to calculate other
cluster statistics. Implementing the theory in practice is not so easy, for the following reasons:
The model is unidentiable: a translation of the shock variables Z
l,i+r
= Z
li
together with
a translation of the lter coecients a
lk
= a
l,k+r
corresponds to the same observed process
Y
t
. A permutation of 1, . . . , L also yields the same Y
t
, with a corresponding permutation of
92
4.1. M3 APPROXIMATION
the rows of a
lk
. In estimation, this problem is remedied by xing a choice for the position
and the order. If there are two rows l
1
and l
2
of the lter matrix such that a
l
1
k
= Ca
l
2
k
,
where C is a constant, the model is also unidentiable.
The basic variables for tting the M3 model are the vectors Y
t+i
/(
K
k=1
Y
t+k
), i =
1, . . . , K. In practice, the only guideline to select these short segments from the observed
series is the presence of exceedances; the exact position and length of the clusters are
unknown. We propose a procedure for the selection in Section 4.1.2.
Finite thresholds, unaccounted eects, measurement errors and other factors blur the
signatures in the observed process. The noise will be modeled by assuming that the
variables Y
t+i
/(
K
k=1
Y
t+k
), i = 1, . . . , K are realizations from an M-component mixture
of Dirichlet distributions; this is described in Section 4.1.3.
4.1.2 Preliminary cluster identication
As a temporary solution for the second problem mentioned in Section 4.1.1, we implemented an
iterative procedure to nd an acceptable set of cluster variables.
(0) Select the exceedances Y
r
, r
= 1, . . . , R
above a high threshold u, which corresponds to

R clusters by a runs declustering scheme with a relatively large declustering parameter.
Then iteratively repeat the following procedure:
(1) Select suciently large index sets, the neighbourhoods I
(i)
r
, r = 1, . . . , R around each
group of exceedances, which seem likely to contain entire M3 clusters. Take the variables
W
(i)
r
= (Y
j
, j I
(i)
r
).
(2) Use a K-means declustering method with a xed number of cluster types to group the
neighbourhoods W
(i)
r
and to nd mean cluster proles.
(3) Calculate the correlation of each neighbourhood W
(i)
r
with each of the mean cluster pro-
les, in shifted positions, for a range of shifts. Choose the positions of the best correlations,
and redene accordingly the index set I
(i+1)
r
, r = 1, . . . , R and the clusters of the next
iteration, W
(i+1)
r
= (Y
j
, j I
(i+1)
r
).
(4) If the index set didnt change or oscillation had set in, accept I
(i+1)
r
as the most likely
cluster positions, and choose the nal index sets J
r
so that all exceedances are included,
93
and all clusters are of equal length (say K). The sequences {W
(i+1)
r
} will be the modeled
data set. Otherwise, repeat the procedure from step (1).
In the absence of preliminary information, this procedure yields a plausible choice of extreme
clusters, though it may be misleading, dividing cluster types by bad positioning or grouping
distinct types into one type by the location of their most prominent peak. Simulations showed,
however, that these forms of misclassication still yield acceptable estimates for the extremal
index.
4.1.3 Dirichlet mixtures for the signatures
Dealing with the presence of the noise requires an assumption about the noise distribution. If
this imposes too much restriction, the quality of the estimates may suer, so we should prefer as
exible a specication as possible. A way to obtain such exible modelling of the noise is oered
by mixtures of Dirichlet processes, which were introduced as priors for nonparametric Bayesian
analysis by Ferguson (1973), Antoniak (1974), Dalal (1978) and Dalal and Hall (1980).
The Dirichlet distribution is dened by the density
f(x) =
K
i=1
i
_
K
i=1
(
i
)
K
i=1
x
i
1
i
, (4.1)
x S =
_
x :
K
i=1
x
i
= 1, x
i
> 0
_
,
i
> 0, i = 1, . . . , K.
The Dirichlet distribution with parameters (
1
, . . . ,
K
) will be denoted by Dir(
1
, . . . ,
K
). Its
main properties are:
(D1) if (W
1
, . . . , W
K
) Dir(
1
, . . . ,
K
), and r
1
, . . . , r
l
are integers such that 0 < r
1
< <
r
l
< K, then
_
_
r
1
i=1
W
i
,
r
2
i=r
1
+1
W
i
, . . . ,
K
i=r
l
+1
W
i
_
_
Dir
_
_
r
1
i=1
i
,
r
2
i=r
1
+1
i
, . . . ,
K
i=r
l
+1
i
_
_
,
and in particular, the marginal distributions of W
j
are beta distributions:
W
j
Be
_
j
,
K
i=1
j
_
;
94
(D2) If (W
1
, . . . , W
K
) Dir(
1
, . . . ,
K
), then
E(W
k
) =

k
K
i=1
i
E(W
2
k
) =

k
(
k
+ 1)
K
i=1
i
_
K
i=1
i
+ 1
_
E(W
k
W
j
) =

k
K
i=1
i
_
K
i=1
i
+ 1
_
The notion of Dirichlet processes was introduced by Ferguson (1973).
Denition 4.1. Let E be a state space and E a -algebra of subsets of E. Let be a Radon
measure on E. Then a random probability measure P on (E, E) is a Dirichlet process on (E, E)
with parameter , denoted P D(), if for all k = 1, 2, . . . and all measurable partitions
(A
1
, . . . , A
k
) of E, the random vector (P{A
1
}, . . . , P{A
k
}) has a Dirichlet distribution with
parameter ({A
1
}, . . . , {A
k
}).
The Dirichlet process thus dened satises the Kolmogorov consistency conditions (Kol-
mogorov, 1933), and has the following properties:
(DP1) If P D(), and A E, then E{P(A)} =
(A)
(E)
;
(DP2) If P D(), then P is almost surely discrete.
Antoniak (1974) dened mixtures of Dirichlet processes, by extending the measure into a
transition kernel and introducing a mixing measure H.
Denition 4.2. Let (E, E) and (U, U) be two measurable spaces. A transition kernel on U E
is a mapping from U E into [0, ) such that
(a) for every u U, (u, ) is a nite nonnegative nonnull measure on (E, E);
(b) for every A E, (, A) is measurable on (U, U).
Denition 4.3. Let (E, E) be a measurable space, (U, U, H) a probability space called the index
space, and let : U E [0, ) be a transition measure. We say that P is a mixture of
Dirichlet processes on (E, E) with mixing distribution H and transition measure , if for all
k = 1, 2, . . . and any measurable partition A
1
, . . . , A
k
,
Pr{P(A
1
) y
1
, . . . , P(A
k
) y
k
} =
_
U
D{y
1
, . . . , y
k
| (u, A
1
), . . . , (u, A
k
)} dH(u),
where D{y
1
, . . . , y
k
|
1
, . . . ,
k
} denotes the distribution function of the Dirichlet distribution
with parameters
1
, . . . ,
k
.
95
Taking a nite discrete measure H =
m
l=1
u
l
, we get the nite Dirichlet mixtures:
Pr{P(A
1
) y
1
, . . . , P(A
k
) y
k
} =
m
l=1
l
D{y
1
, . . . , y
k
|
l
(A
1
), . . . ,
l
(A
k
)}, (4.2)
where
l
(A
k
) is shorthand for (u
l
, A
k
).
The Dirichlet process, and also the mixture of the Dirichlet processes, is thus a probability
measure which is dened on the space M(E) of the probability measures on E. They are therefore
mostly used in Bayesian analysis as a prior distribution. The other important fact is that they
can approximate any probability distribution on a wide range of spaces with arbitrary precision.
Dalal (1978) and Dalal and Hall (1980) prove that if E is a well-behaved (i.e., compact Hausdor
or Polish) space, then the closure of the space of all Dirichlet mixtures in the weak sense is equal
to the space M(M(E)) of the probability measures on M(E). The Dirichlet mixtures thus are
dense in M(M(E)): in any open neighborhood of a probability measure on M(E), there is a
mixture of Dirichlet processes. In a Bayesian context, this is called the adequacy of the Dirichlet
mixtures.
Consider now a signature c
lk
= a
lk
/
K
i=1
a
li
for a given l. It can be mapped into the space
M(E) so that it denes a probability measure on the space E = {1, . . . , K}: let P
l
(j) = c
lj
and
P
l
(I) =
jI
c
lj
for all possible subsets I E. Also, P
l
(E) = 1 by denition of the signatures,
and P
l
(I) > 0 for all non-empty I E. The signature c
lk
can for example be thought of as
the parameter vector of a multinomial distribution with K outcomes. It would be perfectly
known if we could observe the limiting pure M3 process, since then we would nd the precise
signature, and in this case, its distribution in M(M(E)) could be represented by the point
mass
(c
l1
,...,c
lK
)
() concentrated at c
lk
. When the limiting M3 process has L signatures, their
distribution can be expressed as
L
l=1
p
l
(c
l1
,...,c
lK
)
(), with p
l
the frequency of the signature l.
The noise however distorts these innitely sharp signatures, and what we observe on nite
thresholds shows variability around the limiting values. The Dirichlet mixtures provide a exible
semiparametric way of modelling the deformation. On M(E), with E = {1, . . . , K}, the mixture
of Dirichlet processes can be represented by a mixture of Dirichlet distributions. We will assume
furthermore that the noise distribution g
l
(x) of the signature l is well approximated by a nite
Dirichlet mixture of the form (4.2), with density
g
l
(x) =
m
l
j=1
p
lj
f
lj
(x), 0 p
lj
1,
m
l
j=1
p
lj
= 1,
for the signature l, where every component f
lj
(x) has the Dirichlet form (4.1). The mixture is
96
unique up to permutations of the components {1, . . . , m
l
}. The noisy M3 model thus is given
by
g(x) =
L
l=1
p
l
g
l
(x)
=
L
l=1
p
l
m
l
j=1
p
lj
f
lj
_
x;
(l)
j1
, . . . ,
(l)
jK
_
, (4.3)
0 p
j
, p
lj
1,
m
l
j=1
p
j
= 1,
m
l
j=1
p
lj
= 1,
where f
lj
_
x;
(l)
j1
, . . . ,
(l)
jK
_
is a Dirichlet density (4.1), with parameters
(l)
jk
, k = 1, . . . , K for
all l = 1, . . . , L and j = 1, . . . , m
l
. This can be written in the simpler form
g(x) =
M
m=1
m
f
m
(x;
m1
, . . . ,
mK
), (4.4)
with {
m
, m = 1, . . . ,
L
l=1
m
l
} = {p
1
p
11
, . . . , p
1
p
1m
1
, . . . , p
L
p
L1
, . . . , p
L
p
Lm
L
} and {f
m
, m =
1, . . . ,
L
l=1
m
l
} = {f
11
, . . . , f
1m
1
, . . . , f
L1
, . . . , f
Lm
L
}.
The model (4.4) is appropriate for estimation, though it is not the same as the pure M3 model.
The Dirichlet mixture is not max-stable. Also, in general there is more than one component in
the Dirichlet mixture which corresponds to one signature. In order to keep the distinction clear,
we will term the components of the Dirichlet mixture types, as opposed to the signatures
in the M3 model. Assuming that the observed neighbourhoods W
r
are a sample of size R
from a Dirichlet mixture of unknown number of components, describing a noisy M3 process,
we can estimate the probability
m
of types and the Dirichlet parameters
mk
, m = 1, . . . , M.
Estimation of the parameters of mixtures is a common task in statistics, and an appropriate
method for our specication is the EM algorithm (Dempster et al. (1977), see a summary in
Appendix B), which can be easily implemented for the Dirichlet mixture.
In the extreme-value limit, when n and u
n
, the Dirichlet mixture modelling
the neighbourhoods of exceedances above u
n
is expected to tend to the point mass mixture
measure
L
l=1
p
l
(c
l1
,...,c
lK
)
(). From property (DP1), this suggests that for each type m, there
is a signature l such that
mk
K
i=1
mi
c
lk
, l = 1, . . . , L, k = 1, . . . , K.
Moreover, if M
l
= {m :
mk
/
K
i=1
mi
c
lk
}, then
mM
l
m
p
l
.
97
We may also expect the variance to tend to zero, that is, as n ,
jk
, l = 1, . . . , L, k = 1, . . . , K, j = 1, . . . , m
l
.
This limit cannot be expected to be attained in practice. The noise at commonly used thresholds
can cause large variability of the observed clusters, and can require a Dirichlet mixture descrip-
tion with a large number of components for each signature. At the same time, the number of
observed clusters W
r
can be very limited. This implies a bias-variance trade-o similar to the
usual situation in extreme-value statistics, and it means we may only get a rough picture of the
limiting M3 process either with large bias or with large variance.
To estimate the lter matrices a
lk
of a limiting M3 process, we must rst estimate the
signatures c
lk
and the probabilities p
l
. An obvious way to do so is to estimate the prole c
mk
for each Dirichlet type by
c
mk
=

mk
K
i=1

mi
, m = 1, . . . , M, k = 1, . . . , K.
To obtain the signature probabilities p
l
, we should nd the groups M
l
of the types that are very
similar, and then put
p
l
=
mM
l

m
.
The nal step would be to estimate the M3 signatures from the type proles c
lk
by, for instance,
c
lk
=
1
m
l
mM
l
c
m
, l = 1, . . . , L,
where m
l
is the number of proles in M
l
. This method would require an objective decision
procedure to nd the groups M
l
, which is a dicult task, since we should determine which of
the obtained types will converge to the same signature in the limit. However, a simple heuristic
argument suggests that for the calculation of the extremal index, nding the groups M
l
may be
avoided. Dene a
mk
=
m
mk
/
mi
, that is, we treat all the types as if they corresponded
to a distinct signature, and calculate the lter matrix estimates for a corresponding M3 process.
Let
=
M
m=1
max
k
a
mk
.
98
Then, taking the same limit as above,
lim
n
= lim
n
M
m=1
max
k
a
mk
= lim
n
M
m=1
max
k
mk
mi
= max
k
lim
n
L
l=1
mM
l

m
mk
mi
= max
k
L
l=1
p
l
c
lk
=
l
max
k
a
lk
= .
This short calculation suggests that in order to estimate the extremal index, we can formally
treat the obtained types c
mk
as signatures, the probabilities
m
as the frequencies of the signa-
tures, and simply apply the formula for the extremal index of an M3 process dened by these
parameters.
An assessment of the quality of the t is possible using beta quantile-quantile plots, based
on the beta margins of the Dirichlet distribution, when the tted model is likely to have
only one component for each signature. The Dirichlet model then corresponds to assuming
gamma-distributed variables on the original scale Y
i
. For every sample cluster W
r
, we cal-
culate the posterior likelihood
m
f
m
(W
r
;
m1
, . . . ,
mK
) for belonging to each of the types
m = 1, . . . , M, and we classify it as the type where this is the largest. We obtain thus the
population W
m
= {W
r
: W
r
belongs to type m} for all m. Then separately for all m, we can
plot the component W
rk
of every cluster W
r
W
m
against the quantiles of the beta distribu-
tion Be
_

mk
,
K
i=1

mi

mk
_
for all k = 1, . . . , K. In case of a good t, we expect to see
approximately straight lines. However, when there are more than one mixture component in
the tted model corresponding to one signature, there is no point in comparing them directly
to the beta quantiles. In such a case, the mixture model corresponds to a semiparametric way
of modelling the noise around cluster types.
99
4.2 Estimation
4.2.1 Procedure
The summary of the proposed procedure is the following:
(1) Transform the sequence Y
i
to the unit Frechet scale. This involves a classical GPD analysis
of the sequence.
(2) Select clusters, that is, neighbourhoods of exceedances of a suciently high threshold by
the procedure described in Section 4.1.2. Calculate the realized signatures in the process by
dividing each vector of neighbourhoods by its sum. These comprise the extremal clusters
of the process.
(3) For the scaled clusters, t the Dirichlet mixture model by the EM algorithm, estimating
the Dirichlet parameters of the noise. Try a few plausible values M = 1, . . . , M
max
for the
number of cluster types. Launch the procedure with a number of initial value combinations
for all M.
(4) Estimate the lter matrix parameters by
a
mk
=
m

mk
K
i=1

mi
, m = 1, . . . , M, (4.5)
where
mk
and
m
are respectively the Dirichlet parameters and the mixture component
probabilities of the selected t. This yields the parameter estimates for the presumed
limiting M3 process.
(5) Estimate the extremal index by
m
max
k
a
mk
, (4.6)
and other desired cluster characteristics by combining information from the GPD analysis
of the extremes of the sequence and the estimated lter matrices a
mk
.
(6) Variances for the estimates can be based on the delta method and the asymptotic normality
of the estimates from the EM algorithm, but can be misleading for a number of reasons:
the Dirichlet mixture assumption for the noise is not always good; due to asymmetry, the
quadratic approximation to the likelihood as a function of the parameters a
mk
of the lter
matrix is poor near 0, the boundary of the domain; and joint normality of the a
kl
is a crude
100
4.2. ESTIMATION
approximation because of the constraint
M
m=1
K
k=1
a
mk
= 1 on the matrix elements. A
slight improvement could be obtained by using a sandwich information matrix for the
Dirichlet coecients
mk
, but this does not improve the aspects related to asymmetry of
the likelihood and the constraint. Obtaining bootstrap standard errors could be another
possibility, but it raises other issues, since estimates on the repetitions can converge to
local maxima instead of the global one. Also, the EM algorithm is quite time-consuming,
and repeating it enough times to obtain reliable bootstrap condence intervals may be
practically unfeasible in large problems. The variance estimates given in the thesis are all
based on asymptotic normality.
(7) The quality of the Dirichlet modelling for the noise can be checked by diagnostic quantile-
quantile plots.
4.2.2 Simulations
To obtain a rst impression of the performance and the possible problems of the method, we
estimated the signatures and the univariate extremal index of the three examples of Section 3.1.5.
The questions were whether the Dirichlet mixtures provide adequate modelling of the variability
on nite thresholds, whether we can see hints at the expected shrinking of the variability around
the limiting M3 signatures, and how the estimated extremal indices depend on model complexity
M and the threshold u. A full simulation study is, unfortunately, infeasible with the present
procedure. The time requirements of the large number of ts to nd the global maximum of
the likelihood, and the handwork needed for the preliminary cluster identication precludes
performing the procedure on more than a few repetitions.
We used three thresholds, F(u) = 0.95 with sample size 5,000, F(u) = 0.98 with sample
size 12,000 for the AR processes and 8,000 for the Markov chain, and nally F(u) = 0.998 with
sample size 120,000 for the AR processes and 110,000 for the Markov chain. These combinations
yielded 60-80 clusters, which were then classied with the pam procedure of the package cluster
of R (R Development Core Team, 2004). This procedure was used in the preliminary declustering
step for all simulated or data examples of the thesis. The run parameter was 4 for the AR(1)
process and the Markov chain, and 7 for the AR(2) process.
The model was tted to the clusters of each simulated process above each threshold with
M = 1, . . . , 6, and with at least 50 dierent sets of initial values for M = 1, . . . , 4 and 150 sets in
the case of M = 5, 6. Launching the process from several initial value combinations is desirable,
101
0 50 100 150 200 250
1
1
3
AR(1)
Index
y
1 2 3 4 5 6
4
0
0
0
3
6
0
0
BIC
Number of components
B
I
C
2 4 6 8 10
0
.
0
0
0
.
1
0
p1 = 0.32

2 4 6 8 10
0
.
0
0
0
.
1
0
p2 = 0.23

2 4 6 8 10
0
.
0
0
0
.
1
0
p3 = 0.38

2 4 6 8 10
0
.
0
0
0
.
1
0
p4 = 0.08

Figure 4.1: Results of the Dirichlet mixture modelling for the AR(1) process. The left panel in the
upper row shows a part of the Frechet-transformed series on the logarithmic scale. The right panel in
the upper row presents the BIC of the ts with dierent initial values, as a function of M. The bottom
row shows the four rows of a
mk
(solid line) as a function of k, and its 95% condence interval (dashed
lines) of the best four-component model.
because the mixture likelihoods generally admit a number of local maxima beside the global
maximum. A full survey of the DKM-dimensional parameter space is impossible, so we chose
initial values from only a subspace of it, using the results of the preliminary cluster identication
as guidelines. The preliminary declustering gives a rough classication of the observed clusters,
together with the typical cluster prole for each type. The mean
mk
/
mi
of each Dirichlet
component can be inferred from these typical proles. The average of the pointwise variances
of the observed clusters belonging to type m were then used to obtain an approximate idea

mk
about the true Dirichlet parameter
mk
. We generated random gamma variables A
mk
for
each position k = 1, . . . , K and for each m = 1, . . . , M, such that the expected value at (m, k)
was equal to
mk
, and as an initial value for
mk
, we took max{0.15, A
mk
} to ensure that in
the rst few steps, the optimization algorithm does not quit the parameter space (0, )
DKM
.
The optimization was performed with the optim procedure of R, with the method L-BFGS-B
and imposing the constraints
mk
> 0.1 for all m, k instead of
mk
> 0, to avoid extremely
102
4.2. ESTIMATION
0 50 100 150 200 250 300
2
2
4
6
AR(2)
Index
y
1 2 3 4 5 6
8
9
0
0
8
6
0
0
BIC
B
I
C
5 10 15 20
0
.
0
0
0
.
1
0
p1 = 0.17

5 10 15 20
0
.
0
0
0
.
1
0
p2 = 0.44

5 10 15 20
0
.
0
0
0
.
1
0
p3 = 0.13

5 10 15 20
0
.
0
0
0
.
1
0
p4 = 0.25

Figure 4.2: Results of the Dirichlet mixture modelling for the AR(2) process. The left panel in the
upper row shows a part of the Frechet-transformed series on the logarithmic scale. The right panel in
the upper row presents the BIC of the ts with dierent initial values, as a function of M. The bottom
row shows the four rows of a
mk
(solid line) as a function of k, and its 95% condence interval (dashed
lines) of the best four-component model.
large values of the gamma functions occurring during the optimization. We found no ts with
estimated parameter values on the boundary among the best models, so this constraint does not
seem over-restrictive.
With higher M, many initial value combinations led to solutions with near-singular infor-
mation matrix: the solution contained a cluster type that had only one element. Admissible
solutions were required to contain only cluster types with populations more than 3 (practically,
5% of the total number of clusters in most cases), so that for higher M values, many solutions
must have been rejected. We calculated the BIC for all the admissible models, and in Figures
4.1, 4.2 and 4.3, all these ts are shown on the BIC plots for the clusters above the threshold
F(u) = 0.98.
The lower panels of Figures 4.1, 4.2 and 4.3 show the rows of the estimated lter matrix
from the ts corresponding to the lowest value of the BIC: four-component models for the two
autoregressive processes, and a two-component model for the Markov chain, though the two
103
0 50 100 150 200 250 300
2
0
2
4
6
Markov chain
1 2 3 4 5 6
2
5
0
0
2
4
0
0
BIC
B
I
C
2 4 6 8 10 12
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
p1 = 0.38

2 4 6 8 10 12
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
p2 = 0.62

Figure 4.3: Results of the Dirichlet mixture modelling for the Markov chain. The top plot shows a part
of the Frechet-transformed series on logarithmic scale. The leftmost panel in the bottom row presents
the BIC of the ts with the used set of initial values, as a function of L. The middle and the right
panel in the bottom row show respectively the rows of the estimated lter matrix a
1k
and a
2k
of the best
two-component model (solid lines) and their 95% condence interval (dashed lines), as a function of k.
autoregressive processes are expected to have in the limit only one extremal cluster prole.
What we obtained thus is not the lter matrix of the M3 process. The rst two panels at
the bottom of Figure 4.1 reveals two almost identical lter matrix rows, and comparison of
the condence intervals in the rst two and the third panel suggests that the third row is
not signicantly dierent from them, either. However, the Dirichlet parameter estimates were
strongly dierent for these types. The rows of the tted matrix a
mk
represent the types of the
Dirichlet mixture, yielding a semiparametric decomposition of the variability entailed by the use
of a nite threshold.
An illustration of how the mixture model ts the noise on increasing thresholds is given in
Figure 4.4 using the AR(1) example. The top row shows the estimated proles
mk
/
i

mi
of each Dirichlet component m = 1, . . . , M using the 0.95-quantile as threshold for selecting
104
4.2. ESTIMATION
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.41
S
i
g
n
a
t
u
r
e
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.2
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.39
F(u) = 0.95
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.38
S
i
g
n
a
t
u
r
e
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.32
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.23
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.08
F(u) = 0.98
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.55
Location
S
i
g
n
a
t
u
r
e
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.05
Location
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.21
Location
2 4 6 8 10
0
.
0
0
.
2
0
.
4
0
.
6
p = 0.19
Location
F(u) = 0.998
Figure 4.4: The estimated signatures (black dots) of the AR(1) process for thresholds F(u) = 0.95 (top
row), F(u) = 0.98 (middle row) and F(u) = 0.998 (bottom row), against the theoretical cluster prole
(blue circles) and the observed clusters (light gray bundles). Each observed cluster is plotted in the panel
corresponding to its posterior classication. The vertical segments are the 0.975- and 0.025-quantiles of
the tted model component.
105
2 3 4 5 6
0
.
2
6
0
.
3
0
0
.
3
4
AR(1)
M
E
s
t
.

t
h
e
t
a

2 3 4 5 6
0
.
1
8
0
.
2
4
0
.
3
0
AR(2)
M
E
s
t
.

t
h
e
t
a

2 3 4 5 6
0
.
2
0
0
.
3
0
0
.
4
0
Markov chain
M
E
s
t
.

t
h
e
t
a

Figure 4.5: The dependence of the estimated extremal index (dots) on the number M of cluster types and
on threshold u, against the true value (thin horizontal lines). The estimates corresponding to F(u) = 0.95
are shown in blue, those to F(u) = 0.98 in violet, and those to F(u) = 0.998 in black. The circles are the
estimates from the models with the highest likelihood, that is, with the lowest BIC for the given M, the
lled blobs are the best models selected by the BIC, and the short horizontal bars denote approximate
95% condence intervals. The AR(1) model is shown in the left panel, the AR(2) in the middle panel,
the Markov chain in the right.
exceedances, the middle row for the 0.98-quantile (for which the lter matrix is presented in
Figure 4.1), and the bottom row for the 0.998-quantile. The single theoretical M3 signature is
plotted as blue circles on every panel: the convergence of the method can be traced from the
top to the bottom row, as the tted signatures approach the theoretical values. We also added
a measure of variability at every location k, a vertical segment indicating the region between
0.025- and 0.975-quantiles of the tted Dirichlet component. If the Dirichlet mixture models the
noise adequately, the observed normalized clusters should mostly pass within this segment, with
only a few excursions. In order to judge if this is so, we calculated the posterior classication of
each observed cluster, and plotted them in the corresponding panel as light gray lines. The plots
show the decreasing variability with increasing thresholds, as the limiting M3 model is slowly
approached. Most of the observed clusters are within the region between the quantiles, so the
mixture model seems to give a faithful picture of the variability for the AR(1) process, though
there might be slightly more excursions than expected, especially at the peaks. This kind of plot
can be used as a model diagnostic, comparing the observed clusters with the variability allowed
by the t.
The dependence of the estimated extremal index on the number of mixture components and
on threshold is presented in Figure 4.5. The two higher thresholds give estimates quite close to
106
4.2. ESTIMATION
2 3 4 5 6
0
.
2
9
5
0
.
3
1
0
AR(1)
M
E
s
t
.

t
h
e
t
a
2 3 4 5 6
0
.
2
4
0
.
2
7
AR(2)
M
E
s
t
.

t
h
e
t
a
2 3 4 5 6
0
.
3
0
0
.
3
4
0
.
3
8
Markov chain
M
E
s
t
.

t
h
e
t
a
Figure 4.6: The estimated extremal index (black circles) for all admissible models obtained using thresh-
old F(u) = 0.98, and the true value (thin horizontal lines). The estimates from the models with the
highest likelihood are plotted as violet circles, and the estimate from the model selected by the lowest
BIC is the lled violet blob. The AR(1) model is shown in the left panel, the AR(2) in the middle panel,
the Markov chain in the right.
the theoretical value; the relative bias is less than 12.5% in every case, except for M = 2 with
F(u) = 0.998 for the Markov chain. The lower threshold, F(u) = 0.95 gives clearly downward-
biased estimates for the two autoregressive processes. The condence intervals do not always
contain the true value, but this may be the consequence of the poorness of the variance estimate.
Plotting the estimates from all admissible models obtained with dierent initial values, using
the 0.98-quantile as threshold, shows that the best Dirichlet mixture model selected by the BIC
generally does not yield the best extremal index estimate (Figure 4.6). The estimates however
form quite tight groups for each mixture complexity M. Thus, the point seems to be to nd
a criterion able to select the value of M which yields the tightest group of estimates around
the true extremal index. The performance of the Bayes Information Criterion does not seem
unambiguously good from this point of view. However, the use of such models may be broader
than estimation of the extremal index. Using the fact that the distribution of the cluster peaks
is known, and that the Dirichlet mixture yields a model for the neighborhoods of an exceedance,
it is possible to develop methods for estimation of cluster characteristics such as the number
of exceedances or the total excess in a cluster. Analysis of other functionals and risk analysis
also may be formulated based on the conditional distributions. This raises the question how to
select the model closest to the limiting M3 process: it may be dierent from the model closest to
a given nite mixture of Dirichlet distributions, which is constructed to provide an acceptable
description of the nite-threshold noise. In this sense, our Dirichlet mixture model is indeed
107
more a model for the noise distribution at a given nite level than a model for the limiting M3
process, and the BIC selects the best mixture for the noise, not the best limiting M3 process.
More studies are necessary to survey comprehensively the behaviour of the BIC and of other
criteria, such as the AIC
c
, and the development of other criteria, specic to the problem, may
also be useful.
4.3 Summary
The method presented in this chapter is based on the M3 approximation. It was originally pro-
posed in the context of the estimation of the multivariate extremal index by Smith and Weissman
(1996), and it provides a complete picture about the trajectory of the process around the ex-
tremes. Combined with the results of the marginal GPD estimation necessary to transform the
sequences to have unit Frechet margins, it can be possible to obtain estimates of cluster func-
tionals, risk and loss estimates or return level calculations. Thus, it is worthwhile to investigate
its potential in the univariate case too.
Simulated examples showed the ability of the M3 approximation to nd the characteristic
cluster proles when these exist, as in the case of the autoregressive processes. The estimates
for the extremal index in all examples are quite close to the theoretical value. The extremal
index is only a summary measure of serial dependence on extreme levels, so unfortunately this
is not equivalent to a good identication of the signatures of the limiting M3 process: it can be
considered only as an indication to the possible usefulness of the method in the estimation of
cluster functionals. This requires further investigation and comparison with the results of other
methods. The method has a few drawbacks, most importantly, the poorness of the uncertainty
assessment, the absence of a well-functioning selection criterion for L and the necessity of the
preliminary cluster identication. Possible solutions to these include a Bayesian approach to the
problem (Richardson and Green, 1997). If these problems can be solved, the method provides
on one hand a limiting model for the extrapolation to very high thresholds by furnishing the
limiting cluster signatures, and on the other hand, yields a model for the nite-threshold noise
distribution, accounting for the distortion due to being far from the region of asymptotic validity.
This is not included as a rule into extreme-value analyses: the asymptotic GEV or GPD models,
once a threshold is chosen, are treated as true models.
108
Chapter 5
Multivariate methods
This chapter discusses the estimation of clustering in the multivariate case: the two new methods
are adapted to deal with multidimensional data sets.
In Section 5.1, we extend the univariate likelihood methods to the multivariate case, by
dening appropriately a collection of univariate sequences and estimating their extremal index.
This implies the implementation of the necessary misspecication tests on a grid in a high-
dimensional space, which makes its application awkward. A bivariate example illustrates the
method. A bivariate data set is also considered in Section 5.2, in which the misspecication
tests demonstrate again their power to detect the breaking of the fundamental assumptions and
to select an appropriate run parameter-threshold combination.
Section 5.3 presents the methodology based on the M4 approximation for the extreme clusters
of a multivariate time series. After a summary of the background for the procedures in a
multivariate setting, we discuss the implications of the multivariate setting on the proposed
estimation procedure. We give a simulation example, on which we compare the results from a
pointwise application of the univariate methods. Section 5.4 presents the analysis of the data
set of Section 5.2, here using M4 methods.
5.1 Pointwise methods
5.1.1 Estimation procedure
Using Property T5 of the extremal index function presented in Section 2.2.2, the extremal index
function of a D-variate process can be estimated by any univariate method. This might be
dicult when D is large, since the extremal index function should in principle be estimated at
109
CHAPTER 5. MULTIVARIATE METHODS
every point of a suciently dense grid on the D-dimensional unit simplex. The application of the
likelihood methods described in Section 3.1 raises a further problem: the choice of the threshold
and the run parameter. In a univariate case, this can be done by calculating the surface of
the ratio

J(
(K)
)/
I(
(K)
), the information matrix test T(
(K)
) or both as a function of u and
K. In a multivariate setting, the calculation of the surfaces should be performed separately
at every gridpoint. This is a huge task. If we are able to cope with it, then we possibly nd
that appropriate thresholds and run parameters are dierent at neighboring points, which may
result in abrupt jumps in the estimates. On the other hand, choosing the threshold and the
run parameter to be common on the whole grid can imply dierent biases in dierent regions
of the grid. Estimation by maximum likelihood is therefore quite problematic, and for tasks of
dimension higher than 2, useful only if we can aord some bias in the estimates in exchange for
the simplication of the diagnostics.
Such simplication could for example be to use summaries of the test statistics: calculate
the mean and the variance of the statistics over the grid for each pair of thresholds and run
parameters. This provides approximate information about the acceptable values, although there
can be local regions of misspecication. Under the null hypothesis of absence of misspecication,
the test statistics T() follows a
2
1
distribution at every gridpoint, but the strong correlation
between their values at neighboring points prevents using the central limit theorem to derive
a standard test statistic. This is so for the ratio J(
(K)
)/I(
(K)
) as well: strong correlation does
not allow the use of a central limit theorem for its standardized variant. Thus, we must base
our decision about thresholds and run parameters on more or less intuitive judgment.
Suppose that we want to estimate the extremal index function of a D-variate stationary
sequence Y
t
by maximum likelihood. Denote the components of Y
t
by Y
td
(d = 1, . . . , D). The
estimation procedure can be summarized as follows:
(1) dene a grid
i
, i = 1, . . . , M on the D-dimensional unit simplex that is dense enough for
our purposes. For each gridpoint
i
= {
i1
, . . . ,
iD
}, construct the univariate sequence
V
t
(
i
) by
V
t
(
i
) = max
d{1,...,D}
id
Y
td
, t Z.
The univariate extremal index (
i
) of V
t
(
i
) is the value of the extremal index function
of Y
t
at
i
.
(2) Calculate the sequences of K-gaps S
(K)
(u,
i
) from every series V
t
(
i
) for the thresholds
110
5.1. POINTWISE METHODS
0 50 100 150 200
0
5
0
1
0
0
1
5
0
Noisy M4
t
0.0 0.2 0.4 0.6 0.8 1.0
0
.
1
0
.
3
0
.
5
Extremal index
Tau
Figure 5.1: A short section of the bivariate noisy M4 process (left panel) and the extremal index function
of the pure M4 (right panel). The two component processes are plotted in blue and violet.
u and run parameters K of interest. Calculate the estimates of (
i
) and the misspeci-
cation tests as functions of u and K at every
i
. If D = 2, it is possible to check the
behaviour of these surfaces with respect to
i1
with the movie technique we applied to
assess nonstationarity in the Neuchatel daily minimum temperature series. If D > 2, it
is better to calculate the grid mean and variance of the test statistic values T(
;
i
, u, K)
for all (u, K) by

T(
;
i
, u, K) = M
1
M
i=1
T(
;
i
, u, K), and to base decisions on these
summaries. The main guidelines are to select the pair with the mean test statistic closest
to its value under a well-specied likelihood and with low variance.
(3) Accept the

value with the selected threshold and run parameter as the estimate for
every S
(K)
(u,
i
) sequence. The variance of the estimates can be obtained pointwise from
asymptotic normality, with the sandwich variance form, since using summary statistics for
the parameter selection may imply locally misspecied likelihood.
5.1.2 Simulated example
From the few examples in the literature for which the extremal index function can be calculated,
we chose the M4 model itself, to which we added noise. The pure M4 process without noise has
known signatures, and its extremal index function can be calculated from equation (2.25). We
suppose that adding the independent noise does not change the dependence structure of the pure
process, so that the extremal index function remains the same. We will use it to demonstrate
111
the strong and weak points of both the pointwise likelihood method, and a multivariate variant
of the Dirichlet mixture approach of Chapter 4.
The D-variate stationary series Y
t
is an M4 process if it is given by
Y
td
= max
k
max
l
a
lkd
Z
l,tk
, d = 1, . . . , D, t Z,
where {Z
li
, l N, i Z} are sequences of independent random variables with the unit Frechet
distribution, and the coecients {a
lkd
, l N, k Z, d = 1, . . . , D} are nonnegative constants
satisfying
l=1
k=
a
lkd
= 1, d = 1, . . . , D.
The process we used in the simulation has the lter matrices
a
lk1
=
_
_
_
_
_
0.02 0.26 0.16 0.03 0.02 0.03
0.08 0.13 0.02 0.02 0.01 0.01
0.01 0.04 0.03 0.04 0.04 0.02
_
_
_
_
_
,
a
lk2
=
_
_
_
_
_
0.07 0.13 0.11 0.01 0.02 0.05
0.05 0.04 0.11 0.03 0.04 0.06
0.04 0.05 0.05 0.07 0.04 0.05
_
_
_
_
_
,
which generate a bivariate observable sequence Y
t
= (Y
t1
, Y
t2
) with three dierent signatures of
length 6. We used a sample Y
t
of size n = 12000. To add noise, every value of the pure M4
process was replaced by a lognormal-distributed variable with mean equal to the pure M4 value
and with random variance, stochastically larger for larger M4 values. Figure 5.1 shows a short
part of the sequence and the extremal index function of the pure M4 process calculated from
() =
l
max
k
max
d
a
lkd
k
max
d
a
lkd
d
.
The unit simplex on which we want to obtain the estimates of the extremal index function is
the segment dened by the coordinates (, 1 ) : 0 1 for D = 2; as a shorthand, we will
denote this point by , and deal with the extremal index function on the interval [0, 1]. After
constructing the sequences V
t
() by the denition given in Section 5.1.1, and calculating the
sequences of K-gaps for (u, K) pairs covering the range F(u) = 0.95, 0.96, 0.97, 0.98, 0.99 and
K = 1, . . . , 12 for each V
t
(), we calculated the values of T(
(K)
) at every gridpoint. Then we
took the grid mean of T(
(K)
) for every pair (u, K), and to have an idea about the variability
over the grid, we calculated also its empirical variance. The results of this are presented in
Figure 5.2.
112
5.1. POINTWISE METHODS

0.96
0.98

5
10

0
2
4
Mean of T

0.96
0.98

5
10

0
2
4
Variability of T
Figure 5.2: The average value of information matrix test over the grid,

T(
) (left panels) and the square

root of its variance (right panel), for the noisy M4 example. Horizontal foreground axis: threshold,
horizontal left axis: run parameter K, vertical axis: the mean and the standard error of T(
). The blue
lines denote the value
2
1
(0.95), though here this is just a reminder for the critical value of pointwise
misspecication at the gridpoints.
The test statistic rejects the lower thresholds with K = 1 and 2. In this region, the grid
mean of the test statistic is above the pointwise acceptable level, which means that many of
the gridpoints must admit misspecied models for these thresholds and run parameters. The
model is closest to perfectly specied at all gridpoints for the combinations F(u) = 0.98, K 1
and F(u) = 0.99, K > 2; the pair with the smallest acceptable run parameter seems to be
F(u) = 0.98, K = 1, though the test statistic values are even closer to 0 for F(u) = 0.98 and
K = 2. The movie of the test, made in this case as a function of , conrm these conclusions:
the combination F(u) = 0.98, K = 2 is the lower endpoint of the region of well-specied models
over the whole interval [0, 1], and so is the best possible choice for extreme-value analysis.
Pointwise maximum likelihood estimates with the two combinations F(u) = 0.98, K = 1 and
F(u) = 0.98 and K = 2, and using the intervals estimator for two thresholds corresponding the
0.96- and the 0.98-quantiles are shown in Figure 5.3. The condence limits for the intervals esti-
mator were calculated pointwise by the bootstrap procedure presented in Section 2.1.5, and for
the maximum likelihood estimator, they were based on asymptotic normality using a sandwich
information matrix, accounting for the possible local misspecications over the grid. Compari-
son of the maximum likelihood estimates shows that the two auxiliary parameter combinations
give fairly close estimates to each other and to the true extremal index function. The intervals
estimates are above the true extremal index function, and their discrepancy due to the use of
113
0.0 0.2 0.4 0.6 0.8 1.0
0
.
1
0
.
3
0
.
5
0
.
7
E
x
t
r
e
m
a
l

i
n
d
e
x
0.0 0.2 0.4 0.6 0.8 1.0
0
.
1
0
.
3
0
.
5
0
.
7
Figure 5.3: Extremal index function estimates by pointwise intervals (left panel) and pointwise maximum
likelihood estimators (right panel) for the jittered M4 process, together with the extremal index function of
the pure M4 (solid black lines) and 95 % condence intervals (dashed lines). For the intervals estimator,
thresholds u with F(u) = 0.96 (red line) and with F(u) = 0.98 (orange line) were used, and for the
maximum likelihood estimates, the combinations F(u) = 0.98, K = 1 (dark blue) and F(u) = 0.98, K = 2
(light blue).
dierent thresholds is larger. In this case, the combined application of the misspecication tests
and maximum likelihood estimation looks preferable.
5.2 Data analysis
We considered two simultaneous sequences of daily maximum temperatures from Arosa, a moun-
tain site and Bern, an urban one, both from Switzerland. The observations span the period from
January 1, 1961 to June 1, 2006. They were deseasonalized and de-trended the same way as the
Central England mean temperatures and the Neuchatel daily minimum temperatures. We used
only summer data, from the months June, July and August. Though the ten-year median of
the deseasonalized temperatures on Arosa gives indication of nonstationarity at the end of the
century, we rst treated the two series as stationary.
In this case, the means of the misspecication test and its variance were relatively high for
higher thresholds, as Figure 5.4 demonstrates, so we created the movies of the test statistics along
the [0, 1] interval, playing the role of time. The movies showed regions of strong misspecication
on the (u, K) plane, especially near the Bern margin. Some instants of T(
) are presented in
Figure 5.5, showing that the 0.98-quantile at all reasonable run parameter values behaves badly,
114
5.2. DATA ANALYSIS

0.96
0.98

5
10

0
2
4
Mean of T

0.96
0.98

5
10

0
2
4
St.error of T
Figure 5.4: Grid average (left panel) and square root of variance over the grid (right panel) of the
information matrix test, T(
) (left) as a function of u and K, for the complete series of the Arosa-Bern

bivariate daily maximum temperatures. Horizontal foreground axis: threshold, horizontal left axis: run
parameter K, vertical axis: the mean or the standard error of the test statistics.
and indicating other regions of misspecication too. The plots suggest that only very low or
very high thresholds with large run parameters yield models acceptably close to the limiting
point mass-exponential mixture.
The almost general bad behaviour raises the question whether the fundamental assumption
of stationarity is valid or not. A look at the 1-gap sequence reveals that 2003 is exceptional:
it is represented by many zero 1-gap values and very short inter-cluster gaps both in Bern
and at Arosa. The 1-gaps in other years are generally longer, with fewer zeroes. Also, the
gaps from 2003 make up a large part of the series. Removing 2003 from the original set and
performing again the misspecication tests, the dierence is easily discernible, as Figure 5.6
shows: misspecication almost disappears near the Bern margin, and weakens considerably in
the middle of the interval [0, 1]. The model now seems to be well-specied for all (u, K) pairs
tried, so in order to choose a combination for a model, we apply the guideline to select the area
where misspecication is the lowest, and estimates are stable for the largest part of the unit
interval.
In Figure 5.7 the region of lowest average misspecication is (F(u) = 0.96, K = 4). This is
also the best pair for the sequences including 2003. Calculating the extremal index functions
with and without 2003, we see that the maximum likelihood estimator is not very sensitive to the
dierence. The intervals estimator uctuates more strongly as a function of , and is rather more
sensitive to whether we include 2003 or not. The movies, as a function of , show that the region
115

0.96
0.98

5
10

0
5
10
Bern

0.96
0.98

5
10

0
5
10
Tau = 0.09

0.96
0.98

5
10

0
5
10
Tau = 0.18

0.96
0.98

5
10

0
5
10
Tau = 0.27

0.96
0.98

5
10

0
5
10
Tau = 0.36

0.96
0.98

5
10

0
5
10
Tau = 0.45

0.96
0.98

5
10

0
5
10
Tau = 0.54

0.96
0.98

5
10

0
5
10
Tau = 0.63

0.96
0.98

5
10

0
5
10
Tau = 0.72

0.96
0.98

5
10

0
5
10
Tau = 0.81

0.96
0.98

5
10

0
5
10
Tau = 0.9

0.96
0.98

5
10

0
5
10
Arosa
Figure 5.5: The variation of the test statistic T(
(K)
) with , for the complete series of the Arosa-Bern
parameter K, vertical axis: T(
(K)
).
116
5.2. DATA ANALYSIS

0.96
0.98

5
10

0
5
10
Bern

0.96
0.98

5
10

0
5
10
Tau = 0.09

0.96
0.98

5
10

0
5
10
Tau = 0.18

0.96
0.98

5
10

0
5
10
Tau = 0.27

0.96
0.98

5
10

0
5
10
Tau = 0.36

0.96
0.98

5
10

0
5
10
Tau = 0.45

0.96
0.98

5
10

0
5
10
Tau = 0.54

0.96
0.98

5
10

0
5
10
Tau = 0.63

0.96
0.98

5
10

0
5
10
Tau = 0.72

0.96
0.98

5
10

0
5
10
Tau = 0.81

0.96
0.98

5
10

0
5
10
Tau = 0.9

0.96
0.98

5
10

0
5
10
Arosa
Figure 5.6: The variation of the test statistic T(
(K)
) with , for the Arosa-Bern bivariate daily maximum
temperatures without 2003. Horizontal foreground axis: threshold, horizontal left axis: run parameter
K, vertical axis: T(
(K)
).
117

0.96
0.98

5
10

0
2
4
Mean of T

0.96
0.98

5
10

0
2
4
St.error of T
Figure 5.7: Grid average (upper row) and square root of variance over the grid (lower row) of the
information matrix test, T(
) as a function of u and K, for the complete series of the Arosa-Bern

parameter K.
of sensitivity to data from 2003 is approximately [0, 0.7] (Figures 5.5 and 5.7), since the
values of the misspecication tests dier strongly for these values. This region coincides with
the region in Figure 5.8 where the intervals estimates strongly dier. The maximum likelihood
estimates, with the choice F(u) = 0.96, K = 4, seem to be less sensitive to the inclusion or
exclusion of 2003. The estimates dier noticeably only in the interval [0.3, 0.7], and there
the estimate including 2003 is below the estimate excluding it. Thus, the extremal index function
estimates shift towards stronger dependence and larger average cluster sizes with the inclusion
of data from 2003. This agrees with common sense: 2003 brought heatwaves uncommon until
then, large clusters of hot days over all Europe, including Switzerland. Their inclusion may shift
the estimated extremal index noticeably towards stronger clustering.
5.3 M4 modelling
5.3.1 M4 approximation
Theorem 2.28 (Smith and Weissman, 1996) oers an alternative way of estimating the extremal
structure of a multivariate process. In the multivariate case an M4 process is dened as
Y
td
= max
k
max
l
a
lkd
Z
l,tk
, d = 1, . . . , D, t Z,
where D is the number of simultaneous time series, {Z
li
, l N, i Z} are sequences of inde-
pendent random variables with the unit Frechet distribution F(x) = e
1/x
, 0 < x < , and the
118
5.3. M4 MODELLING
0.0 0.2 0.4 0.6 0.8 1.0
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
E
x
t
r
e
m
a
l

i
n
d
e
x
0.0 0.2 0.4 0.6 0.8 1.0
0
.
3
0
.
4
0
.
5
0
.
6
0
.
7
0
.
8
Figure 5.8: The intervals (left panel) and the 4-gaps maximum likelihood estimates (right panel), using
a threshold with F(u) = 0.96. The estimates including 2003 are plotted as thick solid black lines, the
estimates excluding it as red lines. Dashed lines are 95 % condence bands, bootstrap percentiles for
intervals, derived from sandwich variance for maximum likelihood estimates.
coecients {a
lkd
, l N, k Z, d = 1, . . . , D} are nonnegative constants satisfying
l=1
k=
a
lkd
= 1, d = 1, . . . , D.
For the sake of estimation, we will take l = 1, . . . , L with nite L and k = 1, . . . , K; the
coecients a
lkd
can thus be considered as d matrices, each with L rows and K columns.
The estimation of the M4 model is somewhat more dicult than the estimation of an M3
process. Though the lter matrices a
lkd
have a unit sum for all d, they are not natural to model
with a Dirichlet variable, because they are never observed simultaneously in the series. If we
observe a cluster of extreme events in the multivariate stationary sequence Y
t
, the observations
Y
t1
in the period are proportional to a
l
1
k
1
1
for some l
1
and k
1
, the observations Y
t2
are propor-
tional to a
l
2
k
2
2
for some l
2
and k
2
, and so on. Thus, the observations of the extreme periods in
the sequence Y
t
cannot be put into direct relationship with the lter matrices.
For an M3 process, the estimation problem could be considered as a mixture problem. By the
model, the observed clusters are generated by L independent unobservable Frechet sequences,
so that we have L dierent signatures a
li
/
K
k=1
a
lk
, each occurring with probability p
l
. We
supposed that the clusters do not appear in their pure form, but blurred, and we assumed that
the distribution of the blurred clusters in the observed process, conditional on l, is a mixture
of Dirichlet distributions, so that their means are equal to the signature l. Moreover, for an
M3 process p
l
=
K
k=1
a
lk
. By estimating the parameters p
l
and
lk
of the Dirichlet mixture,
119
we could straightforwardly calculate the estimated lter coecients a
lk
. Such an immediate
decomposition into signatures and cluster type probabilities is unavailable for the M4 processes,
as there is no obvious relationship between the sum of the lter coecients
K
k=1
a
lkd
in the
marginal processes and the signature probability. To have a Dirichlet mixture form, we need to
prove a proposition, which exploits the independence in the underlying Frechet array.
Proposition. Let {u
n
} be a sequence of thresholds with u
n
, and suppose that for a given
l and t, Z
lt
> u
n
. Let I
t
denote the index set {t + 1, . . . , t +K}. Then
Pr
_
Y
t
d
= a
l,t
t,d
Z
lt
, t
I
t
, d = 1, . . . , D | Z
lt
> u
n
_
1,
as u
n
.
Proof.
Pr {Y
t
d
= a
l,t
t,d
Z
lt
, for all d, t
I
t
| Z
lt
> u
n
_
= Pr
_
a
l,t
t,d
Z
lt
> a
l
,t
,d
Z
l
t
, for all d, l
, t
I
t
, t
: t
I
t
, | Z
lt
> u
n
_
= Pr
_
Z
lt
> a
1
l,t
t,d
a
l
,t
,d
Z
l
t
, for all d, l
, t
I
t
, t
: t
I
t
, | Z
lt
> u
n
_
Pr
_
Z
l
t
a
1
l,t
t,d
a
l
,t
,d
u
n
, for all d, l
, t
I
t
, t
: t
I
t
, | Z
lt
> u
n
_
=
=1,...,L
t
:t
I
t
Pr
_
Z
l
t
a
1
l,t
t,d
a
l
,t
,d
u
n
, for all d, t
I
t
_
,
and for any xed set of coecients a
lkd
, this probability tends to 1, when u
n
.
Thus, as we increase thresholds, the neighbourhoods of large values in the sequence Y
t
become more likely to be proportional to the unobserved extreme shock variable Z
lt
occurring
in the lth generating sequence at time t. The impact of an individual shock exceedance Z
li
at
time i appears simultaneously in all component sequences Y
td
, and all variables in the observed
D-variate cluster are proportional to the same generating shock variable Z
lt
, with the lth rows
from each lter matrix a
lkd
as coecients. Figure 5.9 illustrates what the signatures look like
in the multivariate case.
For a D-dimensional M4 process Y
t
, given by its set of coecients a
lkd
, dene the sequence
W
i
by W
D(t1)+d
= Y
td
(see Figure 5.10). The nite-dimensional distributions of W
i
imitate
those of the corresponding M4 variables. The limiting cluster proles of W
i
are given by the
same mappings of the limiting signatures of the original M4. Dene also the M3 process W
t
120
5.3. M4 MODELLING
a(lk1)
l = 3
l = 2
l = 1
a(lk2)
l = 3
l = 2
l = 1

a(lk1)*Z(lk)

a(lk2)*Z(lk)
d = 1
d = 2
Figure 5.9: The eect of an exceedance in the underlying shock variables in the example of Section
5.2 with L = 3, K = 6 and D = 2. The lter matrices, a
lk1
in blue, a
lk2
in violet, are shown in the
left panels. The product of the three shock sequences Z
lt
(black spikes) and the two lter matrices are
represented as the blue and violet parts at the base of the spikes in the middle panels. The position is
such that the second element in the rst row of the lter matrices multiplies the extreme shock variable
Z
1t
. The resulting observed M4 series is plotted in the right panels.
with underlying shock variables Z
lt
, l = 1, . . . , L and lter coecients
b
l,D(k1)+d
= D
1
a
lkd
, l = 1, . . . , L.
It follows from the denition that the coecients b
lk
satisfy the conditions
D
l=0
K
k=1
b
l,d+D(k1)
= 1, d = 1, . . . , L, (5.1)
and also
l=0
DK
k=1
b
lk
= 1,
which ensures that W
t
has unit Frechet margins. Proposition 3 is straightforwardly applicable
to W
t
, ensuring that the joint nite-dimensional distribution of its extreme clusters (and in fact,
all nite-dimensional distributions up to order DK) tend to the same distribution concentrated
on the same signatures as W
t
. This M3 has the same nice structure already presented in Section
4.1.1, with the information about its parameters decomposable and estimable as the signatures
a
li
/
K
k=1
a
lk
and their probability
K
k=1
a
lk
.
121
Observed sequences
0 10 20 30 40 50
0
6
Corresponding M3
Figure 5.10: A bivariate M4 process and the corresponding univariate sequence W
t
. The extreme cluster
in the M4 and its counterpart in W
t
is highlighted by lighter colour.
As a consequence, we can estimate an M4 process by almost the same procedures as an M3
process. By rearranging the D-dimensional extreme clusters of Y
t
into one-dimensional vectors,
we create the clusters of the corresponding univariate process W
t
. Then, using the asymptotic
distributional equality between its signatures and those from the M3 process, we can estimate
these signatures and their probability as we did for an M3. This requires only a few modications
to the estimation procedure outlined for the univariate case. The preliminary declustering needs
some change in calculating correlations for the matrices Y
j
1
, . . . , Y
j
K
instead of correlations for
a vector. Also, we have to construct the sequence W
t
. Once we have the best correlated clusters
and constructed their univariate counterpart, the estimation procedure is the same as in the
univariate case.
5.3.2 Modications of the univariate method for the multivariate case
The modication to the preliminary cluster identication procedure, given in Section 4.1.1,
consists of rst nding the periods where at least one of the components of Y
i
is extreme, then
nding the best correlated positions of their neighbourhoods.
(0) Select the observations Y
t
which exceed a high threshold u in at least one component.
Suppose that we obtain R clusters by a multivariate runs declustering scheme. Two
extremes Y
td
and Y
t
d
belong to the same cluster, if t
t is smaller than the run parameter.

The extent of the cluster is the total extent of the period in which all observations Y
t
,
which are extreme in at least one component, are separated by less than the run parameter.
Then iteratively repeat the following procedure for i = 1, . . .:
(1) Select a suciently large index set, the neighbourhood I
(i)
r
, r = 1, . . . , R around each
122
5.3. M4 MODELLING
cluster, which is likely to contain the entire M4 cluster.
(2) Use some cluster analysis method, for example a K-means procedure with a xed number
of cluster types to group the neighbourhoods {Y
j
: j I
(i)
r
}.
(3) Calculate the correlation of the intersecting part of each neighbourhood {Y
t
: t I
(i)
r
}
with each of the mean cluster proles, in shifted positions, for a range of shifts. Choose
the positions of the best correlations, and redene accordingly the index sets I
(i+1)
r
, r =
1, . . . , R.
(4) If the index set didnt change or oscillation had set in, accept I
(i+1)
r
as the most likely
cluster positions in the original sequence Y
t
, and choose the nal index sets J
r
I
(i+1)
r
so
that all exceedances are included, and all clusters in each marginal sequence are of equal
length K. Let {j
(i)
r
+1, . . . , j
(i)
r
+K} = J
(i)
r
, the set of indices forming the neighbourhood
of the rth cluster. Dene the variables W
jr
by
W
(i)
(d1)K+1,r
= Y
j
(i)
r
+1,d
,
W
(i)
(d1)K+2,r
= Y
j
(i)
r
+2,d
, (5.2)
.
.
.
W
(i)
dK,r
= Y
j
(i)
r
+K,d
,
for d = 1, . . . , D. This denition is slightly dierent from that visualized in Figure 5.10:
here the cluster segments of each component Y
td
remain together, which gives some visual
advantage in surveying the clusters on the screen. The DK-dimensional vectors {W
rj
}
will be the modeled data set. Otherwise, if the index set I
(i+1)
r
was dierent from I
(i)
r
,
repeat the procedure from step (1).
The data forming the clusters are now the DK-variate vectors W
r
r = 1, . . . , R, serving as
the associated one-dimensional clusters to the original D-dimensional clusters. Similarly to the
univariate case, we dene the DK-variate random variables X
r
by
X
r
=
_
W
ri
K
j=1
W
rj
, i = 1, . . . , DK
_
,
and we consider them as a sample from a noisy constrained M3 process with lter matrix b
li
.
Corresponding to the Dirichlet mixture modelling presented in Chapter 4, the formulation of
our assumption about the noise is the following:
123
Main assumption:
The set of the observed extreme M3 clusters X
r
derived from a D-dimensional process
Y
t
can be modeled as a sample from an M-component mixture of DK-variate Dirichlet
distributions.
As in the univariate case, we expect that the componentwise means c
mi
=
mi
/
DK
k=1
mi
of the Dirichlet distributions tend to the signatures b
li
/
DK
j=1
b
lj
for some l, with increasing
Dirichlet parameters and decreasing variance of the normalized clusters as n and u
n
.
The Dirichlet types form groups: M
l
= {m : c
mi
approximates signature b
li
.} Obviously, the
estimating procedure will yield the Dirichlet types, not the signatures. Arguments similar to
those used in the univariate case suggest that the multivariate extremal index can be directly
calculated from the Dirichlet types, avoiding the identication of the groups M
l
.
To check if the basic assumption of the Dirichlet mixtures is acceptable, we can use the
beta quantile-quantile plots when each of the tted Dirichlet types correspond to dierent lter
matrix rows. We calculate the posterior classication of the observed clusters, and compare
each component of each one with the marginal beta distribution of the Dirichlet distribution
corresponding to its posterior type. Visual inspection of the quantile-quantile plot gives an
impression about the validity of the Dirichlet assumption.
There is an issue that had no counterpart in the univariate problem: the constraints
k,m
a
mkd
=
1 for all d in the original process Y
t
put the constraints (5.1) on the lter coecients b
mk
, which
are not automatically satised in estimation. Various ways to build it into the optimization
problem did not yield satisfactory results so far. Until this is resolved, we use these constraints
as a mean of testing the quality of the tted model: if
k,m
a
mkd
for all d are suciently close
to 1, we accept that the found M3 process is indeed the one-dimensional counterpart to an M4,
so it is an acceptable limiting model for the investigated data.
We briey summarize the proposed estimation procedure:
(1) transform the sequences Y
t
to have unit Frechet margins. This requires a classical GPD
analysis of all marginal sequences.
(2) Select clusters, that is, the M3-neighbourhoods of exceedances of a suciently high thresh-
old by the procedure described at the beginning of this section. Calculate the M3-
signatures X
r
by dividing each vector of neighbourhoods by its sum. These comprise
the extremal clusters in the constrained M3 limit corresponding to the process Y
t
.
124
5.3. M4 MODELLING
(3) For the M3-signatures, t the Dirichlet mixture model by the EM algorithm, estimating
the Dirichlet parameters of the noise, with a few plausible values M = 1, . . . , M
max
for the
number of cluster types, and launching the procedure with a number of dierent initial
values for all M. The Bayes Information Criterion can be used as a selection criterion for
the best t.
(4) Based on the parameters
mk
and
m
of the selected t, estimate the lter matrix param-
eters by
b
mj
=
m

mj
DK
i=1

mi
, m = 1, . . . , M, j = 1, . . . , DK, (5.3)
a
mkd
= D
b
m,(d1)K+k
, m = 1, . . . , M, k = 1, . . . , K, d = 1, . . . , D. (5.4)
This yields the parameter estimates

b
mj
and a
mkd
of the Dirichlet mixture approximation
to the limiting M3 and M4 processes, respectively.
(5) Estimate the extremal index function at any point on the unit simplex S
D
by
() =
m
max
k
max
d
a
mkd
k
max
d
a
mkd
d
. (5.5)
(6) Variances for the estimates are based on the delta method and the asymptotic normality
of the estimates from the EM algorithm; as in the univariate case, this can fail to give a
good variance estimate.
(7) Use the diagnostics available to assess the quality of the t: the beta quantile-quantile
plot for the validity of the noise assumption described in Section 4.2.1, and the check
for the sum of the elements of the estimated lter matrices a
mkd
. This latter consists of
calculating the sums and their condence interval based on asymptotic normality, to see
if the condence intervals contain 1.
5.3.3 Simulation
We again use the jittered M4 example of Section 5.1.2. Figure 5.11 gives an impression of the
eect of the jittering. The leftmost column presents the original pure M4 signatures. The other
ve columns show a sample of the extreme clusters of the jittered sequence, before shifting them
to the best correlated position, dened so that the rst exceedance above the 0.98-quantile of
each is at the second position in the neighbourhood. The clusters appear in the plot in order of
125
Type 1
Index
0.0
0.25
0.5
Type 2
Index
0.0
0.25
0.5
Type 3
0.0
0.25
0.5
1
Index
2
Index
3
4
Index
5
Index
6
7
Index
8
Index
9
10
Index
11
Index
12
13
Index
14
Index
15
Figure 5.11: A few observed clusters of the jittered M4 process. The rst component of the observed
bivariate sequence is plotted in blue, the second in violet.
2 3 4 5 6 7
3
3
5
0
3
2
0
0
Bayes Information Criterion
B
I
C
1 2 3 4 5 6
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
Noisy M4
0.1

1 2 3 4 5 6
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
Noisy M4
0.41

1 2 3 4 5 6
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
Noisy M4
0.35

1 2 3 4 5 6
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
Noisy M4
0.14

1 2 3 4 5 6
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
M4
0.45

1 2 3 4 5 6
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
M4
0.3

1 2 3 4 5 6
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
M4
0.24

Figure 5.12: The BIC (left upper panel) and the tted M4 parameters for the best model (right upper
panel and middle row). The solid lines are the estimates, with colours distinguishing the two components
of the observed bivariate sequence, dashed lines: 95% condence intervals. The pure signatures are
presented in the bottom row for comparison. The numbers in the titles are the occurrence probabilities
of the plotted signatures.
126
5.3. M4 MODELLING
their occurrence in the sequence Y
t
, so there is no relationship between the pure signature on
the left and the observed clusters on the right.
The ts of the Dirichlet mixture models were performed for M = 2, . . . , 7, each with at least
50 dierent initial values. The same technique was applied to choose the initial value combina-
tions as in the univariate case, described in Section 4.2.2. Similarly to the univariate simulations,
many of the found local optima dened solutions having at least one cluster type with population
less than or equal to 3. These were dropped as unrealistic. The Bayes Information Criterion
was calculated for all remaining solutions.
The best model was found to contain not 3, but 4 signatures, as can be seen in Figure 5.12.
Extracting the lter coecients, it turns out that all three true types were found as almost
perfect hits, but a fourth type appeared as well. Also, the estimated signature probabilities
dier somewhat from the true. These facts suggest imperfect classication of the observed
clusters, probably due to the distorting eect of noise and the failure of the preliminary cluster
identication.
The test statistic S =
m,k
a
mk1
= 2
m,k
a
mk2
equals 1.06(0.13), with the standard
error in brackets. The sum is suciently close to 1, so the found M3 limit process indeed
corresponds to a bivariate M4. Since all the found signatures dier, we can apply the beta
quantile-quantile plots to check the quality of the t. Figure 5.13 reects the fact that the
assumption of the Dirichlet mixture does not perfectly match the true noise distribution, which
was in fact a heteroscedastic lognormal. The t seems to be worst for the most frequent cluster
type (see the bottom row of Figure 5.13 referring to the signature plotted in the lower left corner
of Figure 5.16), and there is a systematic deviation in several other components too. Despite the
problems in signature identication and the weakness of the Dirichlet assumption, the estimated
extremal index function is quite close to that of the pure M4 process, as can be seen in the left
panel of Figure 5.14. The right panel shows remarkable invariance of the estimates given by
the BIC-best models with M = 3, . . . , 7 with respect to M. This is due to the fact that for
every M 3, the best ts nd all the characteristic cluster types. The estimates seem to be
slightly closer to the true extremal index function for higher M; this casts doubt on choosing
the best model using BIC, as is common in mixture models, and suggests simply choosing the
most complex model that contains only types with sucient population.
127
Y[1]
S
a
m
p
l
e
Y[2]

Y[3]

Y[4]

Y[5]

Y[6]

S
a
m
p
l
e

S
a
m
p
l
e

Theoretical
S
a
m
p
l
e
Theoretical

Theoretical

Theoretical

Theoretical

Theoretical

Figure 5.13: The diagnostic beta QQ plots for the best four-component t selected by the BIC for the jittered
M4 process. Columns refer to the element of the observed extreme cluster from the rst to the sixth day; the
four rows correspond to the four types of the model. Blue corresponds to the rst component of the observed
bivariate sequence, violet to the second. The horizontal axes correspond everywhere to the theoretical quantiles,
the vertical axes to sample quantiles.
5.4 Data analysis
As a data example for the application of the M4 approach, we reconsider the data example of
Section 5.2, the bivariate daily maximum temperature sequence from Arosa and Bern, in their
deseasonalized, detrended, Frechet-transformed form. We assumed stationarity for their joint
distribution, as in the pointwise likelihood estimation, and to be closer to this assumption, we
left out 2003. The neighbourhoods of exceedances above the 0.98-quantile were considered as
basic input elements of the cluster selection procedure described in Section 5.3.2. A few clusters
in the best correlated positions found are shown in Figure 5.15: a couple of repeatedly occurring
patterns of single hot days either in Bern or at Arosa emerge, beside the more irregular, atter
proles with two or more hot days at one or both of the sites.
The ts of the M4 approximation with L = 2, . . . , 8 and at least 50 initial value sets for each
L showed a 4-component model to be the best. This is plotted in Figure 5.16, with its four
128
5.4. DATA ANALYSIS
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
E
x
t
r
e
m
a
l

i
n
d
e
x
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
True
L = 2
L = 3
L = 4
L = 5
L = 6
L = 7
Figure 5.14: The estimated extremal index function (left and right panels, red solid line) calculated from the
best model, and its 95 % condence interval (red dashed) for the jittered M4 example. The estimate is compared
to the theoretical extremal index function of the pure M4 process (solid black line, left panel). The right panel
shows the estimates by models with M = 2, . . . , 7 with various colours.
estimated signatures. The proles are suciently far from each other to be taken as dierent
signatures, not only dierent types, since there are no two among them which have condence
intervals everywhere overlapping. We nd the types the naked eye could already roughly identify
in the sample of clusters in Figure 5.15: the single one-day excursions to very hot temperatures
at Bern and at Arosa. When interpreting the signatures found, we should bear in mind two
important facts: rst, the two fundamental series were deseasonalized, detrended temperatures,
that is, they represent anomalies, measured in units of usual variability; and second, these
proles are based on vectors of observations scaled to have unit sum, so they do not furnish
any information about the absolute size of the extreme large deviation, only about its relative
size with respect to the neighbouring deviations. The signatures with single peaks show that in
Bern, the deviation from the normal state measured in units of usual variability on single
extremely hot days is not so far away from usual as at Arosa, where the oscillation of the extreme
temperatures is higher. These proles might represent local variations in the temperature, caused
for instance by a cloudless day. The upper right panel, with its at-topped pattern occurring at
both sites might group the heatwaves associated to stable high-pressure systems dominating a
large area.
The cluster signatures found in the 5-, 6-, 7- and 8-component models with the lowest BIC
all contain the variants of the three more peaky cluster forms, as Figure 5.17 shows in the ve-
129
0.0
0.25
0.5
0.75
0.0
0.25
0.5
0.75
0.0
0.25
0.5
0.75
0.0
0.25
0.5
0.75
0.0
0.25
0.5
0.75
0.0
0.25
0.5
0.75
Figure 5.15: A few observed clusters, normalized to have unit sum, of the bivariate data set of summer
daily maximum temperatures at Bern (violet) and Arosa (blue), after the preliminary declustering.
2 3 4 5 6 7 8
4
9
0
0
4
7
5
0
Bayes Information Criterion
*
1 2 3 4 5 6
0
.
0
0
.
2
0
.
4

0.36

1 2 3 4 5 6
0
.
0
0
.
2
0
.
4

0.4

1 2 3 4 5 6
0
.
0
0
.
2
0
.
4

0.16

1 2 3 4 5 6
0
.
0
0
.
2
0
.
4

0.07

Figure 5.16: The BIC (upper left panel) and the tted M4 parameters for the best model. Solid violet
line: Bern, solid blue: Arosa, dashed lines: 95% condence intervals. The numbers in the title are the
occurrence probabilities of the signature.
130
5.4. DATA ANALYSIS
1 2 3 4 5 6
0
.
0
0
.
2
0
.
4

0.08

1 2 3 4 5 6
0
.
0
0
.
2
0
.
4
3component model
0.41

1 2 3 4 5 6
0
.
0
0
.
2
0
.
4

0.51

1 2 3 4 5 6
0
.
0
0
.
2
0
.
4

0.07
1 2 3 4 5 6
0
.
0
0
.
2
0
.
4
5component model
0.22
1 2 3 4 5 6
0
.
0
0
.
2
0
.
4
0.14
1 2 3 4 5 6
0
.
0
0
.
2
0
.
4
0.2
1 2 3 4 5 6
0
.
0
0
.
2
0
.
4
0.38
Figure 5.17: The cluster types in the BIC-best 3-component (top row) and 5-component (middle and bottom
rows) models. The signatures of Bern is shown in violet, those of Arosa in blue. The numbers in the title are the
occurrence probabilities of the signature.
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
E
x
t
r
e
m
a
l

i
n
d
e
x
M4
Intervals
MLE
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
L = 2
L = 3
L = 4
L = 5
L = 6
L = 7
L = 8
Figure 5.18: The extremal index estimate given by the best 4-component mixture model (solid black), compared
to the intervals (solid orange) and the maximum likelihood estimate (solid blue, left panel) and to estimates by
mixture models with dierent complexity (right panel) for the Arosa-Bern summer daily maximum temperatures.
The dashed lines are the 95% condence intervals.
131
component model. The best three-component solution ts one cluster type for the two types of
the single hot day in Bern and of the 2-day long signature. As Figure 5.18 shows, the estimates
of the extremal index function by the 5 to 8-component models are quite similar, as with the
simulated example: all these models nd the right form of the dominant peaks.
Y[1]
S
a
m
p
l
e
Y[2]

Y[3]

Y[4]

Y[5]

Y[6]

S
a
m
p
l
e

S
a
m
p
l
e

Theoretical
S
a
m
p
l
e
Theoretical

Theoretical

Theoretical

Theoretical

Theoretical

Figure 5.19: The diagnostic beta QQ plots for the BIC-best four-component t. Columns refer to the element
of the observed extreme cluster from the rst to the sixth day; the four rows correspond to the four cluster types
of the model. Observations from Bern are in violet, those from Arosa in blue.
Figure 5.19 shows a mixed picture of the quality of the tted four-component Dirichlet
mixture model. Though not good, the t is acceptable especially at positions 2-4 for the three
more characteristically-shaped signatures, that is, from the second to the fourth day of the
extreme clusters of type 1-3, around the peaks. These are those that have a strong inuence on
the extremal index function. The positions where the t is bad are those that are generally non-
extreme, and which are generated by a small lter coecient in the framework of M4 processes.
The diagnostic plots show the fourth type as the collection of all non-categorizable observed
clusters. This could already be inferred from its Dirichlet parameters (not shown on plots):
they were very small at every location suggesting very high variance on every day of the cluster.
The quantile-quantile plots of this cluster type give the impression of a very bad t. Since the
132
5.5. SUMMARY
lter coecients corresponding to this irregular type are small, and it is relatively rare (see
Figure 5.16), the t remains acceptable as long as we calculate cluster statistics determined
mostly by the three prominent types. This is so for the extremal index function, too. Despite
the local lack-of-t, the estimated extremal index function is close to what was found by other
methods.
5.5 Summary
In this chapter, we developed and tested two new approaches to the estimation of the eect of
the serial dependence on multivariate extremes: the pointwise maximum likelihood method, and
a Dirichlet mixture model based on the M4 limit of max-stable processes.
The maximum likelihood method is originally developed for univariate sequences. Its ap-
plication requires the selection of a run parameter K, beside the usual choice of a threshold u,
and we propose misspecication tests for these choices. The result of the misspecication tests
is summarized as a surface over the plane (u, K). In order to obtain the multivariate extremal
index of a stationary sequence of random vectors, a univariate method must be used at each
point of a ne enough grid on the unit simplex, which means the misspecication tests have to be
calculated and checked at each gridpoint. This makes the application of the likelihood methods
awkward. Checking these surfaces in detail is practically impossible for sequences of dimensions
higher than 2. Summaries of the misspecication test statistics can be used, but these give only
rough information of misspecied regions in the (u, K) plane, and there may remain locations of
misspecication on the simplex. Also, problems arise when the two marginal sequences admit
strongly diering run parameters K
1
and K
2
, with K
1
< K
2
. In such a case, there is no ideal
solution: choosing the lower run parameter K
1
of the rst margin implies misspecication of
independence near the other margin, choosing the larger run parameter K
2
entails bias near the
rst margin, and varying the run parameter over the grid is likely to yield abruptly changing
estimates. Although in low-dimensional cases, where in addition we can expect similar run pa-
rameters at the dierent margins, the method performs well, it is in general more adapted to
nd good estimates in the margins than for the multivariate extremal index.
Simulations show that when the surfaces of the misspecication tests can be checked with
enough detail, these indeed provide help to choose the best u, K combinations for each gridpoint.
Using the auxiliary parameter choices based on the smallest test statistic stable over the grid, the
133
point mass-exponential mixture approximation for the sequence of the K-gaps at the individual
gridpoints is the best possible. The estimates with these combinations were very close to the
true extremal index function in the simulated example.
The bivariate data example, simultaneous daily summer temperatures, highlighted the use-
fulness of the misspecication tests. The tests indicated misspecication on a large part of the
unit interval for most realistic u, K combinations. The cause turned out to be in part non-
stationarity in the sequence, mainly the eect of the extreme hot summer of 2003. Removing
this, the misspecication disappeared, and the best combination of threshold and run parameter
could be chosen. Though the extremal index function is not known in this case, the estimates
by the maximum likelihood are in good agreement with the results from the other proposed
method, the M4 approximation.
As simple simulations and the data example have shown, the Dirichlet mixture model for the
M4 approximation has the potential to become a useful method for multivariate extreme-value
estimation.
In the multivariate case as in the univariate case, it is much less sensitive to declustering
parameters, threshold choice and the selection of other parameters. In cases where the
signatures were known, the method was able to give good signature estimates, which
hints to its good ability to investigate the trajectory of the process around extremes. As a
summary measure of clustering characteristics, the extremal index function estimates were
very close to reality in every simulated case checked. The method can yield much more
detailed information on the temporal structure of a process than does mere estimation of
the multivariate extremal index, by giving the best available average characteristic cluster
forms.
The M4 process with unit Frechet shock variables is a valid limit for asymptotically de-
pendent or independent stationary max-stable processes. This suggests its potential use
in giving a complete probabilistic picture of the extremes of such a process.
An essential characteristic, the value of which nevertheless may be underestimated, is that
the fundamental variables for the M4 approximation, the surroundings of the measured
extremes are physically linked variables. As opposed to classical methods based on com-
ponentwise maxima, the links between the simultaneously measured variables, imposed by
the complex climate system, are not broken. This may supply the additional advantage
134
5.5. SUMMARY
of the opportunity to build physical information into the models, or a way to check the
physical plausibility of the statistical model obtained.
To transform it into a method able to deal with problems commonplace in the applications,
most prominently spatial or spatio-temporal data, there is still a lot of work to be done:
a method to give reasonable variance estimates should be developed. Asymptotic normality
is only of limited use because of asymmetry of the likelihood near zero, of the model
misspecication that is entailed by the Dirichlet mixture assumption, and because of the
constraints on the lter matrix parameters. Another obvious choice, the application of
bootstrap methods, raises the question how to estimate uncertainty attached to the global
maximum in the presence of multiple local maxima of the mixture likelihood. Such issues
are discussed for example in Aitkin et al. (1981), though no solution is given there other
than to avoid the bootstrap in this case. Other possibilities like Bayesian estimation can
also be considered.
The M4 approximation modelled by Dirichlet mixtures involves (DK + 1)L parameters
to estimate. In geostatistical problems, the number of locations D can often reach a few
hundred or more. The cluster length K too can be extremely large: consider hourly
observations of rainfall with clusters possibly days long. For suciently good cluster
modelling, we might also need to t a model with L big, as may be easily the case if we
want to model weather patterns in a large region, with many mixed types when dierent
conditions determine the weather in dierent subregions. In such large-dimensional cases,
without reasonable simplifying assumptions on the patterns of Dirichlet parameters or on
the lter matrices depending on the site locations, altitude or climatic zone variables, the
model can contain so many variables that nding the global maximum becomes practically
impossible. Such reasonable simplifying assumptions or other alternatives, for example
splines, must be investigated.
The modelling of the noise might be a critical question. The Dirichlet mixture model
requires many cluster types to yield a faithful description of the noise. The number of the
extreme clusters in a typical data set generally does not allow a very large number of types,
and it is likely that one signature is modelled with one Dirichlet type. In this case the
Dirichlet mixture assumption corresponds to gamma-distributed noise around the mean
cluster patterns, and this can easily be false. The diagnostics applied to the simulations in
135
the uni- and multivariate case indeed proved that it is only partially acceptable, mostly at
the peaks. Experience so far suggests that as far as the extremal index is concerned, the
estimates are not very sensitive to this misspecication, at least if the noise is light-tailed,
but exploration of other possibilities can be interesting and necessary for other types of
noise.
Global climate change modies the setup by destroying a basic assumption of the theory,
namely stationarity. The inclusion of time as a covariate is not impossible, since a likeli-
hood approach, which is used for the M4 modelling, allows it, but theory applicable in a
nonstationary setting is missing, and the toolkit for applications in such a case should be
developed.
The M4 model given in this form is either asymptotically dependent, or independent.
Theoretical developments suggest that with some modications, of which the most im-
portant is the introduction of light-tailed shock variables, it can model all other types of
extremal dependence (Heernan et al., 2007). Methods built on these modied processes
will need substantial changes, since with nite-tailed shock variables, the signatures dont
necessarily approach xed forms above increasing thresholds.
136
Chapter 6
Discussion
The studies presented in the thesis were motivated by the lacunae of extreme-value statistics
in measuring the eect of serial dependence in data and providing statistical assessment of the
clustering on high levels. The existing methods are exclusively univariate, and hard to adapt to
nonstationary data, which are frequent in practice.
The thesis proposed two new methods for the estimation of clusters of extremes. The rst of
these is a likelihood method, based on the distribution of the truncated inter-exceedance times.
As with likelihood procedures in general, it can easily be adapted for nonstationary data using
either parametric or semiparametric methods, and if the need arises, it can admit covariates
or can be used in Bayesian estimation. This responds to part of our motivation, which was to
develop methods for inference also in our unstable climate.
The application of the likelihood methods involves the selection of a run parameter, as well
as the threshold. The selection of the run parameter has an inuence on the independence of
the resulting sequence of truncated inter-exceedance times, the K-gaps. The selection of the
threshold aects the validity of the model, which is true only in an asymptotic sense. Both of
these imply misspecication of the likelihood, which is constructed from the asymptotically valid
model with the assumption of independence. To these two obvious sources of misspecication,
more can be added as well, such as nonstationarity, long memory or mixture character of the
parent distribution of the extremes. These break the fundamental assumptions of the limiting
theorems of extreme-value statistics concerning the extremal index. Introducing the likelihood
methods opens the way to check for such deviations from model validity, using tests based on
the information matrix. These tests, the second new contribution of the thesis to extreme-
value statistics, proved to be useful in applications, helping select the best auxiliary parameter
137
CHAPTER 6. DISCUSSION
combinations, and detecting regions of model misspecication from other sources, mainly from
nonstationarity. Comparisons to theoretical extremal indices where these were known and to the
results of a few other methods suggested the adequacy of these tests to nd the best auxiliary
parameter combinations and the good behaviour of the likelihood estimator. Unfortunately, the
need for the selection of a run parameter limits the use of the likelihood methods combined with
smoothers: it is not assured that a common combination can be found which is good at every
time point.
The third original contribution of the work is the proposition of an estimation procedure
for multivariate clustering, based on the M4 approximation and using Dirichlet mixtures to
model the nite-threshold noise. Only a rst assessment of the potential use of the model was
done. It is the rst method for the estimation of the multivariate extremal index, but its main
advantage is the patterns provided by the estimation procedure, which describe the complete
trajectory of the process around extremes. This oers the possibility to obtain estimates of
cluster characteristics or functionals of clusters such as loss functions associated to extremes.
Extrapolation may be done using the M4 model dened by the lter matrix parameters found,
whereas estimation on lower levels, although harder, may be done using the Dirichlet mixture
model found, which provides a model for the noise. In order to create a method of broad use,
more studies are necessary, however. There are several unresolved issues, such as selection of
the appropriate model complexity, obtaining variances for the estimates, and the large number
of parameters; once these questions are answered, the method may become a powerful tool in
multivariate extreme-value estimation.
138
Appendix A
The GEV modelling of the
Neuchatel temperature sequences
Appendix A contains the GEV analysis of the summer daily maximum and minimum temper-
atures from Neuchatel (S uveges et al., 2008). The novelty in this analysis is only the use of
the simultaneously measured relative air humidity as a covariate, and it has no direct link to
the developed methodology for the estimation of extreme clusters. Since it provides part of the
motivation to investigate possible nonstationarity in the temperature extremes, we nevertheless
present it as auxiliary material.
A.1 Introduction
Attention is increasingly being paid to climatic extremes such as heatwaves, droughts or storms,
not only because of their current importance, though this is a reason for vivid interest, but also
because of potential changes in their severity and duration under scenarios of global climate
change (IPCC, 2007a,b). Rising air temperature trends since the 19th century and recent ob-
served extreme events need careful analysis in order to assess to what extent extreme events
have already changed and whether we should expect such events to become more severe or more
frequent in the future.
The complex patterns of change found in observed temperatures raise the question of how the
tails of these distributions behavewhether they can show behaviour dierent from that of the
mass of the data, and if so, how one can nd an adequate description for these changes for use in
risk assessment. Another interesting issue is the relations between dierent climatic variables.
139
APPENDIX A. THE GEV MODELLING OF THE NEUCH
ATEL TEMPERATURE SEQUENCES

Statistical methods can provide a better description of a variable of interest if they can capture
essential relationships linking it to other variables: part of the uctuation in the variable of
interest may be explained by the inclusion of well-chosen explanatory variables. In our climatic
region, extremely hot summer days are generally associated with stable dry conditions, whereas
the lowest summer temperatures are mostly observed when cold fronts bring moisture and rain
(Rebetez, 1996). This suggests the possibility of improving models for extreme temperatures by
including a measure of the mixing ratio of the air on the same day, such as relative air humidity,
since the mixing ratio itself is unobserved.
Physical relationships linking these variables and equations for their development have been
much studied, but statistical analyses of long-term nonstationarity of extreme temperatures
cannot exploit them directly: in most time series, air humidity and daily minimum and max-
imum temperature measurements are not made simultaneously, and the observation times are
unknown. In fact, air humidity values measured at the same moment as the daily extreme
temperatures became available only in the last decades, and only at stations having automatic
measurement facilities. Incorporating physical information directly into statistical analysis of
long-term data is therefore impossible in general. The link between the extremal temperatures
and the air humidity must be based on the empirical construction of joint probability models for
them. This study addresses the assessment of nonstationarity in the extremes of a long sequence
of observed daily maximum and minimum temperatures, and the inuence of air humidity on
the distribution of these extremes.
A.2 Statistical methods
According to the goals xed in Section A.1, we investigated the temporal changes and the
statistical relationships between three simultaneously observed daily sequences from one meteo-
rological station. These were the series of daily minimum and maximum temperatures and the
daily average relative air humidity from the Swiss city of Neuchatel, between January 1, 1901
and December 31, 2006. The study ts the generalized extreme-value (GEV) distribution to the
monthly maxima of the standardized Tmax series (xTmax) in the three summer months, and
to the negated summer monthly minima of the standardized Tmin series (nTmin). Negating
the monthly minima ensured applicability of the GEV model, which is appropriate for maxima.
Standardization is decribed in detail in Section 3.2.
140
A.3. RESULTS
For our purposes, which included detection of temporal changes and quantifying the inu-
ence of humidity, we took the three parameters of the GEV model to be time- and humidity-
dependent, and used maximum likelihood methods to nd an acceptable model. A crucial
issue is an appropriate parametric formulation for the time- and humidity-dependence of (t, h),
(t, h) and (t, h), where t denotes time and h denotes humidity. Showing the existence or not
of temporal changes of unknown form in a time series requires an idea well tailored to the data
set. In this case an initial model was obtained by moving window techniques both in time and
in humidity: in xed-length intervals around each time or humidity, we tted a stationary GEV
model with constant parameters. The plots of these estimates as functions of time and humidity,
seen in Figure A.1 as black dots, suggested initial mathematical forms for the functions (t, h),
(t, h) and (t, h), low-order polynomials both in time and humidity with possibly a breakpoint.
These were then plugged into the GEV model, and their parameters estimated by maximum
likelihood.
A.3 Results
We found the following nal model for xTmax,
X
(t, h) =
X
,
X
(t, h) =
X
,
X
(t, h) =
X
+
X
h,
while for nTmin the nal model was the slightly more complex
N
(t, h) =
N
,
N
(t, h) =
N
+
N
(t 70) I(t > 70),
N
(t, h) =
N
+
N
(h 75)
2
I(h < 75), (A.1)
with t in years counted from 1901, h in percent and I(x A) denoting the indicator function of
x falling into the set A. Both shape parameters
X
,
N
and the scale parameter
X
were found
to be constants that do not depend on time or humidity. The estimated coecients and their
standard errors are listed in Table A.1.
Generalized extreme-value models may be most easily visualized through their quantile levels.
The quantile functions for a given probability level p are given by
x
p
=
_
_
_

(1 y
p
), = 0,
log y
p
, = 0,
with y
p
= log(1p).In the nonstationary case and with the additional dependence on humidity,
the return levels seem to have no straightforward interpretation: any return level calculated from
141

xTmax nTmin
Parameter Estimate Standard error Parameter Estimate Standard error
X
0.28 0.05
N
0.35 0.02
X
0.40 0.03
N
0.47 0.01
N
5.68 10
3
3.21 10
3
X
2.93 0.32
N
1.62 0.08
X
0.027 0.005
N
7.95 10
4
2.75 10
4
Table A.1: Parameter estimates and their standard errors for the best models for the xTmax and nTmin
data.
40 50 60 70 80
0
.
8
1
.
2
1
.
6
2
.
0
Standardized xTmax
Relative humidity [%]
L
o
c
a
t
i
o
n
50 60 70 80 90
1
.
6
1
.
2
Standardized nTmin
L
o
c
a
t
i
o
n
1900 1940 1980

0
.
4
0
.
6
0
.
8
1
.
0
Standardized nTmin
Time [years]
S
c
a
l
e
Figure A.1: Nonconstant parameters in the GEV ts. Left panel: the humidity-dependence of the loca-
tion parameter of xTmax (thick red line), together with the estimates from the stationary moving window
GEV ts (black dots). Middle panel: the humidity-dependence of the location parameter of nTmin (thick
blue line) and the estimates from the stationary moving window GEV ts (black dots). Right panel: the
time-dependence of the scale parameter of nTmin (thick blue line) and the estimates from the stationary
moving window GEV ts (black dots). The horizontal segments represent the 95% condence interval
from the stationary moving window GEV ts, the thin solid lines are the 95% condence interval of the
nonstationary GEV ts in all panels.
the tted (t, h), (t, h) and (t, h) functions depends on t and h, and these vary over the time
interval of the observations or the forecast. Also, to nd out about the high quantiles of the
original temperatures, we must put back the removed trend and seasonal component by applying
the inverse standardizing transformations to the return levels of the standardized variables. This
introduces time dependence even though the extreme-value distribution of xTmax does not
depend on time. Thus, the return levels can be considered only for a given time t and humidity
142
A.3. RESULTS
40 50 60 70 80
3
2
3
4
3
6
3
8
4
0
xTmax
0
.
9
7
q
u
a
n
t
i
l
e

[
C
]
40 50 60 70 80 90
6
7
8
9
1
0
nTmin
0
.
0
3
q
u
a
n
t
i
l
e

[
C
]
Figure A.2: 0.97-quantile for xTmax (left) and 0.03-quantile for nTmin (right) versus humidity on
the original temperature scale. The July 20, 2006 trend and seasonal values were used to recalculate
temperatures in Celsius.
h: if we observe the monthly hottest temperature on day t with humidity h, then this will be
larger than the p-quantile with probability 1 p. To see these levels as functions of time and
humidity can be informative even if return levels themselves are not so useful. We illustrate our
results by plotting the 0.97-quantile for the xTmax series and the 0.03-quantile for the nTmin
series as functions of humidity and time.
Figure A.2 shows the humidity-dependence of the 0.97-quantile of the GEV distribution on
the original temperature scale, with the trend and seasonal components for July 20, 2006. This
choice of date does not inuence the pattern of the humidity-dependence, only its position.
The curve would be roughly equivalent to the 10-year return level values, if the 2006 climate
characteristics remain stable in the future and the observed monthly maximum occurs around
July 20. The approximately 6
C temperature span of the curve for xTmax over the humidity

range emphasizes the importance of the association between humidity and heatwaves: this is
the span of variation of the hottest temperature extremes that can be associated with varying
air humidity.
Figure A.3 shows the 0.97- and 0.03-quantiles as functions of time. To reconstruct the
original temperature scale, we again used the July 20 trend and seasonal values from each year
of the period 19012006; for other dates of the summer, the quantile level would be parallel to
this choice, somewhat below the plotted one, since the end of July is typically the hottest period
of the summer. The curves were calculated for a low (50%) and a high (80%) humidity state.
Figure A.4 compares the median, the 0.97- and the 0.99-quantiles and the endpoint of the
distribution over the summers of 1970 and in 2000. The existence of the endpoint is implied
143

1900 1920 1940 1960 1980 2000
3
0
3
4
3
8
xTmax

0
.
9
7
q
u
a
n
t
i
l
e

[
C
]
h = 80% h = 50%
1900 1920 1940 1960 1980 2000
4
6
8
1
0
nTmin

0
.
0
3
q
u
a
n
t
i
l
e

[
C
]
h = 80% h = 50%
Figure A.3: The 0.97-quantile of xTmax (left) and 0.03-quantile (right) of nTmin as functions of time
on the original temperature scale, for two dierent air humidities. For each year, the July 20 trend and
seasonal values were used to recalculate temperatures in Celsius. In each panel the solid lines correspond
to 80% and the dashed lines to 50% air humidity, the thick line being the estimate and the thin lines the
95% condence bounds.
by the negative shape parameter (cf. Table A.1). This is a probabilistic limit for a given time
and humidity: the varying location and scale parameters and the trend in the complete data set
cause the boundary, beyond which observations may not occur, to change. For xTmax, there
is essentially no change in the quantile curves between 1970 and 2000. The only shift is with
humidity: the quantiles move strongly downward with growing humidity, indicating the colder
monthly maximum temperatures when observed together with higher air humidity. For nTmin,
the increasing scale parameter does imply visible temporal changes: for both humidity values,
the quantile lines are farther from each other in 2000 than they were in 1970, with the endpoint
remaining more or less stable, and the others shifting upwards.
Our choice of block length of one month instead of the commonly-used one year, meant that
we have three observations per year per sequence. With such short blocks, the validity of the
GEV model might be doubted, since the GEV is a limiting model expected to provide good
144
A.3. RESULTS
xTmax in 1970
50% humidity
T
e
m
p
e
r
a
t
u
r
e
[
C
]
1
0
2
0
3
0
4
0
1 Jun 1 Jul 1 Aug 1 Sep
80% humidity
T
e
m
p
e
r
a
t
u
r
e
[
C
]
1
0
2
0
3
0
4
0
xTmax in 2000
50% humidity
1
0
2
0
3
0
4
0
80% humidity
1
0
2
0
3
0
4
0
nTmin in 1970
50% humidity
T
e
m
p
e
r
a
t
u
r
e
[
C
]
0
5
1
0
1
5
2
0
80% humidity
T
e
m
p
e
r
a
t
u
r
e
[
C
]
0
5
1
0
1
5
2
0
nTmin in 2000
50% humidity
0
5
1
0
1
5
2
0
80% humidity
0
5
1
0
1
5
2
0
Figure A.4: The upper four panels show the median (black line), the 0.97-quantile (brown), the 0.99-
quantile (red) and the endpoint (orange) of the tted distribution of xTmax versus the day of summer on
the original temperature scale, for 1970 (left) and 2000 (right), and for air humidities 50% (upper) and
80% (lower). For both years, the daily trend and seasonal values of that year were used to recalculate
temperatures in Celsius. The grey spikes are the observed Tmax values. The lower plots show the
median (black), 0.03-quantile (violet), 0.01-quantile (dark blue) and endpoint (light blue) for nTmin, on
the background of the observed Tmin series (grey spikes).
145

0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
St. xTmax
Theoretical probabilities
S
a
m
p
l
e

p
r
o
b
a
b
i
l
i
t
i
e
s
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Negated st. nTmin
Theoretical probabilities
S
a
m
p
l
e

p
r
o
b
a
b
i
l
i
t
i
e
s
Figure A.5: Probability-probability plots for the best GEV ts for the standardized xTmax (left) and
nTmin (right).
approximations for maxima of long blocks. However the probability-probability plots in Figure
A.5 show that the GEV is an acceptable model for both the negated nTmin and the xTmax
sequences.
A.4 Discussion
Our results show two central points:
Only the location parameters of both nTmin and xTmax are inuenced by humidity, and
the form of dependence is dierent. The change of the location parameter with humidity
implies the shift of the probability distributions and thus the quantiles of the extremes:
the hottest standardized anomaly of a summer month tends to be higher if it happens on
a dry day than on a wet day. Also, the coldest standardized anomaly of a summer month
is warmer if it occurs on a dry day than on a wet day. Hot spells, during which one is more
likely to observe the monthly highest temperature, are usually associated with stabilized
high-pressure structures, while the months coldest temperatures are observed most often
during cold fronts bringing high-humidity conditions and precipitation. The advantage of
the inclusion of humidity as a covariate in the modeling of temperature extremes is the
quantication of their relationship, and the possibility of using the additional information
in humidity on extreme days to reduce the variability of the estimates.
146
A.4. DISCUSSION
Only the scale parameter of nTmin changed in time from around 1970. The tting of the
nonstationary GEV model showed this eect to be highly signicant, implying increasing
variability of the observed coldest monthly temperatures from around 1970 onwards.
Figures A.1 and A.2 show a stable location parameter and therefore stabilized expected
temperature and quantile levels for the coldest monthly minimal temperatures when associated
with humidity over 70%. These are days when the mean humidity is close to saturation; as
the relative humidity changes strongly with air temperature, and near the time of the measured
daily Tmin the relative humidity can reach 100%, condensation then begins, stabilizing the
temperature and creating the horizontal part of the quantile curve.
The dots in the middle panel of Figure A.1, representing the moving window stationary ts for
the standardized variables, also show an additional drop of the location parameter at humidities
over 90%. This eect is not included in our model. Although the number of observations allowed
us to t complex models, there were only a few humidities over 90%, and their quality may be
lower. For these reasons, in this humidity interval we adopted an initial model with constant
location parameter, which anyway was compatible with the condence intervals from the moving
window ts. The best model (A.1) thus does not reect this feature.
The temporal dependence of the quantile levels is dierent for xTmax and nTmin. In the
case of Tmax the origin of all changes is the trend and the seasonal variation, since the GEV
distribution was found to be stationary. For Tmin, we found a marked warming trend for the
bulk of the data, and at the same time, an increasing scale parameter for extremes from around
1970. The calculation of quantile functions described in the Appendix leads to the conclusion
that for these last years, the two eects partly compensate each other: the trend shifts the
quantiles upward, acting uniformly at all levels, while the increasing scale parameter pushes the
low quantiles downward, but dierently at each level. The result is a weaker warming trend
in the lower quantiles than for the whole Tmin series; at the lowest levels this warming even
vanishes. The middle of the distribution is warming, while the endpoint weakly decreases from
that time: the lower tail of the distribution is slowly stretching with time.
A number of previous studies (Brown et al., 2008; Kharin and Zwiers, 2005; Nogaj et al., 2006)
found that the location parameter depends linearly on time for both minimum and maximum
temperatures; this corresponds to our trend component, which is near-zero in Tmax and, due to
the moving window technique, nonlinear and nonzero in Tmin. An overall trend acting uniformly
on all temperatures, if not removed, will cause a shift in the location parameter of the GEV
147

distribution of the extremes. The new conclusion of the analysis is the nonstationarity in the
scale parameter of monthly extreme values; the ability to discover it is due to the combined
advantages of the standardizing technique hitherto not applied, the moving window methods
to set up the initial model that gave a good idea about possible dependence forms on time
and humidity, and the inclusion of humidity as a covariate, which explains part of the inherent
variability of the temperature process.
A.5 Summary
Our study investigates the possible nonstationarity in extreme levels of the daily minimum and
maximum temperatures at Neuchatel, additional to the trend and seasonal changes in the mean
and the variance, and the relationship of the extreme-value characteristics to humidity. First
standardizing the sequences allowed us to directly compare extremes in dierent months and
years. We could then apply the GEV model with covariates for the monthly coldest and hottest
temperature observations in summer. Our results allow us to show distributional changes in
time and a strong association between summer monthly extreme temperature and humidity.
Distributional changes in time:
For Tmin, the median line of the distribution of the complete data set is shifted upwards,
and most of the change occurs during the 1980s. Over the past three decades the variability
of the monthly coldest summer temperatures has been increased by an extremal trend
additional to the overall trend in Neuchatel, so its quantiles are now farther apart than
in the past, indicating more variable extremely cold nights now than at the beginning of
the century. For Tmax, we found near-stable temporal behaviour both for the bulk of the
series and the extremes.
Strong association to humidity:
By including air humidity as a covariate, we isolated an important source of variability;
varying humidity implies a change of almost 6
C the expected monthly maximal tem-

perature and more than 2
C in the expected monthly minimal temperature. By varying

humidity we can explain part of the variability of extremes and can draw a clearer picture
of temporal changes. This, together with the use of an overall de-seasonalizing and de-
trending procedure, has led to the identication of the increasing variance of the extreme
148
A.5. SUMMARY
cold summer temperatures. The dependence on humidity diers for hot and cold extremes;
the dierence might be explained by the dierent atmospheric processes acting during the
occurrence of these extremal temperatures.
This result hints also at the importance of any factors that inuence air humidity (vegeta-
tion type, soil moisture, land-atmosphere coupling, and so on) in the forecast of extremal
temperature quantiles in the future, and from another point of view, corroborates evidence
emerging from comparison of coupled and uncoupled numerical models (Seneviratne et al.,
2006). Due to the strong stochastic association between hot temperature extremes and hu-
midity, a shift in the average humidity level of a site may imply hotter temperatures during
heatwaves. Feedbacks between air humidity and other inuential factors like vegetation
can also have powerful eect on the climate shift at extreme levels.
The analysis suggests that maybe better assessment of future temperature or other extremes
can be obtained by incorporating other strongly associated climatic variables into the extreme-
value model. To achieve this, the method needs additional input, such as assumptions about the
extrapolation of humidity dependence of extreme-value parameters, and about trends and sea-
sonal components both for mean and variance of temperatures in the future. The rst of these
may come from physical considerations or from ts of similar models for other stations. Informa-
tion on future temperature medians and variances may be drawn from regional or global climate
models. The third main component, extrapolation of the changes in the extreme-value distri-
bution, requires well-founded arguments outside the scope of this paper. There are also many
ways to rene the method itself, including generalized additive modeling (Chavez-Demoulin and
Davison, 2005), modelling of clustering of rare events (Ferro and Segers, 2003; S uveges, 2007),
and spatial (Schlather and Tawn, 2003) or multivariate modeling of extremes (Coles and Tawn,
1994), which promise new insights into the future behaviour of temperature extremes, based on
both atmospheric physics and statistics.
149

150
Appendix B
The EM algorithm
The EM algorithm (Dempster et al., 1977) is a general approach to the computation of maximum
likelihood estimates when the observations can be considered as incomplete data. Suppose that
we are dealing with data from a family of densities f(x|) depending on a parameter vector
. Instead of directly observing x, we have observations only from some function y = y(x).
In general, this yields only partial information of x, and x is known only to lie in in X(y), the
set solving the equation y = y(x). The variables x are called complete data, the observed y,
incomplete data. This latter has density g(y|), which is related to the complete-data density
by
g(y|) =
_
X(y)
f(x|)dx.
The aim is to nd the maximum likelihood estimate
of which maximizes the incomplete-

data likelihood g(y|) given the observed data y. The EM algorithm does so by alternating
two steps, making use of the complete-data specication. The rst step is the calculation of the
expected value of the sucient statistic of the complete data, given the observations and a value
for the parameter vector. The second step consists of maximizing the complete-data likelihood
by replacing the sucient statistic in it by its expected value, as calculated in the previous step.
When the complete-data density belongs to a regular exponential family, the EM algorithm
is particularly simple. In that case, the complete-data density has the form
f(x|) = b(x) exp{t(x)
T
()},
with being an r-dimensional parameter vector, the natural parameter of the family, t(x) is
the r-dimensional vector of the sucient statistic, and the superscript T denotes transposition.
151
APPENDIX B. THE EM ALGORITHM
Suppose that
(i)
is some initial value or an intermediate result from an iteration step for the
parameter vector. Then the EM algorithm can be formalized as follows:
E-step: estimate the complete-data sucient statistics t(x) by t
(i)
= E{t(x)|y,
(i)
};
M-step: Determine the next estimate
(i+1)
of the parameter vector by solving E{t(x)|} =
t
(i)
for .
The maximization step here corresponds to the maximum likelihood estimation in expo-
nential families. The EM algorithm can be generalized to the case when the complete-data
likelihood does not belong to an exponential family. Introduce the function Q(
|) by
Q(
|) = E{log f(x|
)|y, }.
Then the EM algorithm is modied to the following alternated steps:
E-step: compute Q(|
(i)
);
M-step: Let the next estimate
(i+1)
be the parameter value that maximizes Q(|
(i)
) as a
function of .
These steps are repeated until convergence, in practice until when either |
(i+1)
(i)
|/|
(i+1)
|
or {log g(y|
(i+1)
)log g(y|
(i)
)}/ log g(y|
(i+1)
) falls below a pre-specied tolerance limit. The
algorithm has a number of good properties.
(EM1) For all i, log g(y|
(i+1)
) log g(y|
(i)
).
(EM2) A maximum likelihood estimate is a xed point of the algorithm.
(EM3) If the sequence log g(y|
(i)
) is bounded, and moreover, there exists > 0 such that for all
i,
Q(
(i+1)
|
(i)
) Q(
(i)
|
(i)
) (
(i+1)
(i)
)(
(i+1)
(i)
)
T
,
then the sequence
(i)
converges to some xed point in the closure of the parameter space.
(EM4) Assume that up to the third derivative, all derivatives of the complete-data log-likelihood,
of the incomplete-data log-likelihood, of the conditional log-likelihood of the unobserved
variables and of the expected values used in the algorithm exist and are continuous. As-
sume also that derivatives and expected values are exchangeable. Then if the algorithm
152
converges to some
, the rst derivative of Q(

(i+1)
|
(i)
) is zero, and its second deriva-
tive is negative denite with eigenvalues bounded away from zero,
is the solution of the

score equation for the incomplete-data likelihood, and the second derivative of Q(
)
is negative denite.
Application to a mixture density is straightforward. Suppose that the sample {y
1
, . . . , y
n
}
originates from an M-component mixture, and the component generating observation y
i
is
unknown. The observed sample {y
1
, . . . , y
n
} can be considered as the incomplete data associated
to a complete data set {x
i
} = {(y
i
, z
i
)}, where z
i
is the indicator of the component which
generated y
i
. The variable z
i
can take the possible values r
1
, r
2
, . . . , r
M
dened by
r
1
= (1, 0, . . . , 0),
r
2
= (0, 1, . . . , 0),
.
.
.
r
M
= (0, 0, . . . , 1),
z
i
= r
m
indicating that z
i
was generated from component m.
The complete-data likelihood is most easily expressed with the conditional densities. Suppose
that {z
i
} are an independent, identically distributed sample from a distribution v(z|), and that
conditional on z
i
, the y
i
are independent with density u(y|z
i
, ). Introduce the notations
U(y
i
|) = (log u(y
i
|r
1
, ), . . . , log u(y
i
|r
M
, ))
and
V () = (log v(r
1
|), . . . , log v(r
M
|)).
The complete-data log-likelihood is then
log f(x|) =
n
i=1
z
T
i
U(y
i
|) +
n
i=1
z
T
i
V ().
The E-step of the EM algorithm requires to estimate the unobserved variables z
i
given the sample
y and the current value of the parameter , which is equal to the conditional probabilities of y
i
to belong to each of the components 1, . . . , R. The maximization step consists of maximizing
log f(x|) =
n
i=1
z
T
i
{U(y
i
|) +V ()}, (B.1)
where z
i
is the estimated state vector for observation y
i
, provided by the E-step.
153
The Dirichlet mixture model proposed by the thesis is formalized by taking a multinomial
choice of the state vector z
i
for all observation y
i
as the distribution v(z
i
|). Given z
i
, the
conditional density of the K-vector y
i
is then a Dirichlet distribution. The parameter vector
can be decomposed as = (, ), with = (
1
, . . . ,
M
) holding the probabilities of the M
components and = (
11
, . . . ,
1K
,
21
, . . . ,
2K
, . . . ,
M1
, . . . ,
MK
) the vector of the Dirichlet
parameters. In this case, we see that
V () = V () = (log
1
, . . . , log
M
)
and
U(y
i
|) = U(y
i
|) = (log u
1
(y
i
|
1
), . . . , log u
M
(y
i
|
M
))
with log u
m
(y
i
|
m
) = log
_
K
k=1
mk
_
K
k=1
log (
mk
) +
K
k=1
(
mk
1) log y
ik
with y
ik
denoting the kth component of the vector y
i
. Thus, in the maximization step we can optimize
the two terms of the complete-data log-likelihood (B.1) separately, one for the parameters
of the multinomial distribution of the unobserved vector z
i
, and the other for the parameter
vector of the Dirichlet densities. Moreover, this latter further falls apart to M separate
optimization tasks, since the mth component of the vector U(y
i
|) depends only on the vector
(
m1
, . . . ,
mK
), but not on other elements of the vector .
If the incomplete-data likelihood is too involved, the variance-covariance matrix can be
obtained by using the decomposition (Oakes, 1999)
2
log g(y|)
T
=
_
2
Q(|
T
+

2
Q(|
T
_
=
.
The rst term on the right-hand side can be obtained from the numerical procedures of the EM
algorithm as a byproduct. The second term must be calculated analytically, or approximated
by (Tanner (1996), Boldi (2004))
2
Q(|
T
=
n
i=1
var
_
f(x
i
|)
_
,
where the variance is taken with respect to the conditional distribution of z
i
given y
i
and .
In the case of the Dirichlet mixtures used in the thesis, the variance-covariance matrix of the
estimates was calculated analytically from the incomplete-data log-likelihood.
The number of the components M are in general unknown in advance, and must be selected
after tting a few models with reasonable values for M. There are several criteria to apply for
154
the selection in the literature. The rst is the Akaike Information Criterion applied for mixtures
(McLachlan and Peel, 2000). It is dened as
AIC = 2 log g
m
(y|
) + 2p, (B.2)
where g
m
(y|
) is the maximized log-likelihood of the tted mixture of m components at the

tted parameter value

, and p is the number of unconstrained parameters in the model. The
best number of components is where AIC assumes its minimal value. Since the AIC in regression
models tend to select too many parameters, the literature oers other choices based on other
principles too, one of which is the Bayes Information Criterion:
BIC = 2 log g
m
(y|
) +p log n, (B.3)
with the same notation as in equation (B.2) and n denoting the sample size. The performance
of these two criteria was compared on examples simulated from Dirichlet mixture distributions.
These showed that AIC indeed tends to select overtted models, especially for small n, whereas
BIC is more likely to nd the right number of components.
155
156
Bibliography
M. Aitkin, D. Anderson, and J. Hinde. Statistical modelling of data on teaching styles. Journal
of the Royal Statistical Society, Series A(General), 144(4):419461, 1981.
C. E. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric
problems. The Annals of Statistics, 2:11521174, 1974.
I. Auer, R. Bohm, A. Jurkovic, W. Lipa, A. Orlik, R. Potzmann, W. Schoner, M. Ungersbock,
C. Matulla, K. Bria, P. Jones, D. Efthymiadis, M. Brunetti, T. Nanni, M. Maugeri, L. Mer-
calli, O. Mestre, J. M. Moisselin, M. Begert, G. Muller-Westermeier, V. Kveton, O. Bochnicek,
P. Stastny, M. Lapin, S. Szalai, T. Szentimrey, T. Cegnar, M. Dolinar, M. Gajic-Capka,
K. Zaninovic, Z. Majstorovic, and E. Nieplova. HISTALPhistorical instrumental climato-
logical surface time series of the Greater Alpine Region. International Journal of Climatology,
27:1746, 2007.
A. Balkema and S. Resnick. Max-innite divisibility. Journal of Applied Probability, 14:309319,
1977.
G. Balkema and P. Embrechts. High Risk Scenarios and Extremes. Z urich Lectures in Advanced
Mathematics. European Mathematical Society Publishing House, 2007.
M. Begert, T. Schlegel, and W. Kirchhofer. Homogeneous temperature and precipitation series
of Switzerland from 1864 to 2000. International Journal of Climatology, 25:6580, 2005.
doi:10.1002/joc.1118.
J. Beirlant, Y. Goegebeur, J. Segers, and J. Teugels. Statistics of Extremes. John Wiley, 2004.
S. M. Berman. Convergence to bivariate limiting extreme value distributions. Annals of the
Institute of Statistical Mathematics, 13:217223, 1961.
157
BIBLIOGRAPHY
S. M. Berman. Limit theorems for the maximum term in stationary sequences. Annals of
Mathematical Statistics, 35:502516, 1964.
M.-O. Boldi. Mixture Models for Multivariate Extremes. PhD thesis, Ecole Polytechnique
Federale de Lausanne, 2004.
M.-O. Boldi and A. C. Davison. A mixture model for multivariate extremes. Journal of the
Royal Statistical Society, Series B, 69:217229, 2007. DOI: 10.1111/j.1467-9868.2007.00585.x.
A. W. Bowman and A. Azzalini. Applied Smoothing Techniques for Data Analysis: the Kernel
Approach with S-Plus Illustrations. Oxford: Clarendon Press, 1997.
S. J. Brown, J. Caesar, and C. A. T. Ferro. Global changes in extreme daily temperature since
1950. Journal of Geophysical Research, 113:D05115, 2008. doi:10.1029/2006JD008091.
V. Chavez-Demoulin and A. C. Davison. Generalized additive modeling of sample extremes.
Applied Statistics, 54:207222, 2005.
M. R. Chernick, T. Hsing, and W. P. McCormick. Calculating the extremal index for a class of
stationary sequences. Advances in Applied Probability, 23:835850, 1991.
S. G. Coles. An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London,
2001.
S. G. Coles and J. A. Tawn. Modeling extreme multivariate events. Journal of the Royal
Statistical Society, Series B, 53:377392, 1991.
S. G. Coles and J. A. Tawn. Statistical methods for multivariate extremes: An application to
structural design. Applied Statistics, 43:148, 1994.
D. R. Cox and V. Isham. Point Processes. Chapman and Hall, London, 1980.
S. R. Dalal. A note on the adequacy of mixtures of Dirichlet processes. Sankhy a, 40:185191,
1978.
S. R. Dalal and G. J. Hall. On approximating parametric Bayes models by nonparametric Bayes
models. Annals of Statistics, 8:664672, 1980.
D. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Volume I:
Elementary Theory and Methods. Springer, 2nd edition, 2003.
158
BIBLIOGRAPHY
A. C. Davison. Statistical Models. Cambridge University Press, Cambridge, 2003.
A. C. Davison and N. I. Ramesh. Local likelihood smoothing of sample extremes. Journal of
the Royal Statistical Society, Series B, 62:191208, 2000.
A. C. Davison and R. L. Smith. Models for exceedances over high thresholds (with discussion).
Journal of the Royal Statistical Society, Series B, 52:393442, 1990.
L. de Haan. On Regular Variation and Its Application to the Weak Convergence of Sample Ex-
tremes, volume 32 of Mathematical Centre Tracts. Mathematics Centre, Amsterdam, Holland,
1970.
L. de Haan. Equivalence classes of regularly varying functions. Journal of Stochastic Processes
and Applications, 2:243359, 1974.
L. de Haan. Sample extremes: An elementary introduction. Statistica Neerlandica, 30:161172,
1976.
L. de Haan. A spectral representation for max-stable processes. Annals of Probability, 12:
11941204, 1984.
L. de Haan. Fighting the arch-enemy with mathematics. Statistica Neerlandica, 44:4568, 1990.
L. de Haan and A. Ferreira. Extreme Value Theory: An Introduction. Springer-Verlag, Berlin,
2006.
L. de Haan and J. Pickands. Stationary min-stable stochastic processes. Probability Theory and
Related Fields, 72:477492, 1986.
L. de Haan and S. I. Resnick. Limit theory for multivariate sample extremes. Zeitschrift f ur
Wahrscheinlichkeitstheorie und Verwandte Gebiete, 40:317337, 1977.
L. de Haan, S. I. Resnick, H. Rootzen, and C. G. de Vries. Extremal behaviour of solutions to a
stochastic dierence equation with application to ARCH processes. Stochastic Processes and
their Applications, 32:213224, 1989.
P. Deheuvels. Point processes and multivariate extreme values. Journal of Multivariate Analysis,
13:257272, 1983.
159
BIBLIOGRAPHY
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via
the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39:
138, 1977.
A. Ehlert and M. Schlather. Capturing the multivariate extremal index: bounds and intercon-
nections. Extremes, 2008. DOI:10.1007/s10687-008-0062-6.
P. Embrechts, C. Kl uppelberg, and T. Mikosch. Modeling Extremal Events for Insurance and
Finance. Springer-Verlag, Berlin, 1997.
M. Falk, J. H usler, and D. Reiss. Laws of Small Numbers: Extremes and Rare Events.
Birkhauser, 2004.
J. Fan and I. Gijbels. Local Polynomial Modelling and Its Applications. Chapman and Hall,
London, 1996.
T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics,
1:209230, 1973.
C. A. T. Ferro. Statistical Methods for Clusters of Extreme Values. PhD thesis, Lancaster
University, September 2003.
C. A. T. Ferro and J. Segers. Inference for clusters of extreme values. Journal of the Royal
Statistical Society, Series B, 65:545556, 2003.
R. A. Fisher and L. H. C. Tippett. Limiting forms of the frequency distributions of the largest or
smallest member of a sample. Proceedings of the Cambridge Philosophical Society, 24:180190,
1928.
M. Frechet. Sur la loi de probabilite de lecart maximum. Annales de la Societe Polonaise de
Mathematiques, 6:93116, 1927.
J. Galambos. The Asymptotic Theory of Extreme Order Statistics. John Wiley, 1978.
L. A. Gil-Alana. Time trend estimation with breaks in temperature time series. Climate Change,
89:325337, 2008. DOI:10.1007/s10584-008-9407-z.
B. V. Gnedenko. Sur la distribution limite du terme maximum dune serie aleatoire. Annals of
Mathematics, 44:423453, 1943.
160
BIBLIOGRAPHY
P. J. Green and B. W. Silverman. Nonparametric Regression and Generalized Linear Models: A
Roughness Penalty Approach. London: Chapman and Hall, 1994.
E. J. Gumbel. Distributions de valeurs extremes en plusieurs dimensions. Publications of the
Institute of Statistics of the University of Paris, 9:171173, 1960.
E. J. Gumbel. Statistics of Extremes. Columbia University Press, 1958.
P. Hall and N. Tajvidi. Nonparametric analysis of temporal trend when tting parametric
models to extreme-value data. Statistical Science, 15:153167, 2000a.
P. Hall and N. Tajvidi. Distribution and dependence function estimation for bivariate extreme
value distributions. Bernoulli, 6:835844, 2000b.
J. Heernan and J. A. Tawn. A conditional approach for multivariate extreme values (with
discussion). Journal of the Royal Statistical Society, Series B, 66:497546, 2004.
J. Heernan, J. A. Tawn, and Z.-Y. Zhang. Asymptotically (in)dependent multivariate maxima
of moving maxima processes. Extremes, 10(12):5782, 2007. doi:10.1007/s10687-007-0035-1.
J. R. M. Hosking, J. R. Wallis, and E. F. Wood. Estimation of the generalized extreme-value
distribution by the method of probability-weighted moments. Technometrics, 27:251261,
1985.
T. Hsing. On the characterization of certain point processes. Stochastic Processes and Their
Applications, 26:297316, 1987.
T. Hsing. Estimating the parameters of rare events. Stochastic Processes and their Applications,
37(1):117139, 1991.
T. Hsing. Extremal index estimation for a weakly dependent stationary sequence. Annals of
Statistics, 21:20432071, 1993.
T. Hsing, J. H usler, and M. R. Leadbetter. On the exceedance point process for a stationary
sequence. Probability Theory and Related Fields, 78:97112, 1988.
J. H usler. Multivariate extreme values in stationary random sequences. Stochastic Processes
and their Applications, 35:99108, 1990.
IPCC. Climate Change 2007. Impacts, Adaptation and Vulnerability. Cambridge, 2007a.
161
BIBLIOGRAPHY
IPCC. Climate Change 2007. The Scientic Basis. Cambridge, 2007b.
H. Joe, R. L. Smith, and I. Weissman. Bivariate threshold methods for extremes. Journal of
the Royal Statistical Society, Series B, 54:171183, 1992.
O. Kallenberg. Random Measures. Akademie-Verlag, Berlin, 1983.
V. V. Kharin and F. W. Zwiers. Estimating extremes in transient climate change simulations.
Journal of Climate, 18:11561173, 2005.
A. N. Kolmogorov. Foundations of the Theory of Probability. Chelsea, New York, 1933.
F. Laurini and J. A. Tawn. New estimators for the extremal index and other cluster character-
istics. Extremes, 6:189211, 2003.
M. R. Leadbetter. On extreme values in stationary sequences. Zeitschrift f ur Wahrschein-
lichkeitstheorie und Verwandte Gebiete, 28:289303, 1974.
M. R. Leadbetter. Extremes and local dependence in stationary sequences. Zeitschrift f ur
Wahrscheinlichkeitstheorie und Verwandte Gebiete, 65:291306, 1983.
M. R. Leadbetter and S. Nandagopalan. On exceedance point processes for stationary sequences
under mild oscillation restrictions. Lecture Notes in Statistics: Extreme Value Theory, 51:69
80, 1989. (eds. J. H usler and R.-D. Reiss).
M. R. Leadbetter, G. Lindgren, and H. Rootzen. Extremes and Related Properties of Random
Sequences and Processes. Springer-Verlag, New York, 1983.
A. W. Ledford and J. A. Tawn. Diagnostics for dependence within time series extremes. Journal
of the Royal Statistical Society, Series B, 65:521543, 2003.
A. W. Ledford and J. A. Tawn. Statistics for near independence in multivariate extreme values.
Biometrika, 83:169187, 1996.
A. W. Ledford and J. A. Tawn. Modeling dependence within joint tail regions. Journal of the
Royal Statistical Society, Series B, 59:475499, 1997.
A. W. Ledford and J. A. Tawn. Concomitant tail behaviour for extremes. Advances in Applied
Probability, 30:197215, 1998.
162
BIBLIOGRAPHY
R. M. Loynes. Extreme values in uniformly mixing stationary stochastic processes. Annals of
Mathematical Statistics, 36:993999, 1965.
A. Marshall and I. Olkin. Domains of attraction of multivariate extreme-value distributions.
Annals of Probability, 11:168177, 1983.
A. P. Martins and H. Ferreira. The multivariate extremal index and the dependence structure
of a multivariate extreme-value distribution. Test, 14:433448, 2005.
G. J. McLachlan and D. Peel. Finite Mixture Models. John Wiley: New York, 2000.
Y. Mittal and D. Ylvisaker. Limit distributions for the maxima of stationary Gaussian sequences.
Stochastic Processes and their Applications, 3:118, 1975.
T. Mori. Limit distributions of two-dimensional point processes generated by strong-mixing
sequences. Yokohama Mathematical Journal, 25:155168, 1977.
S. Nandagopalan. On the multivariate extremal index. Journal of Research of the National
Institute of Standards and Technology, 99:543550, 1994.
M. Nogaj, P. Yiou, S. Parey, F. Malek, and P. Naveau. Amplitude and frequency of temperature
extremes over the North Atlantic region. Geophysical Research Letters, 33:L10801, 2006.
doi:10.1029/2003GL019019.
D. Oakes. Direct calculation of the information matrix via the EM algorithm. Journal of the
Royal Statistical Society, Series B, 61(2):479482, 1999.
G. L. OBrien. The maximum term of of uniformly mixing sequences. Zeitschrift f ur Wahrschein-
lichkeitstheorie und Verwandte Gebiete, 30:5763, 1974.
G. L. OBrien. Extreme values for stationary and Markov sequences. Annals of Probability, 15:
281291, 1987.
D. E. Parker, T. P. Legg, and C. K. Folland. A new daily Central England temperature series,
1772-1991. International Journal of Climatology, 12:317342, 1992.
R. Perfekt. Extremal behaviour of stationary Markov chains with applications. Annals of Applied
Probability, 4:529548, 1994.
163
BIBLIOGRAPHY
J. Pickands. The two-dimensional Poisson process and extremal processes. Journal of Applied
Probability, 8:745756, 1971.
J. Pickands. Statistical inference using extreme order statistics. Annals of Statistics, 3:119131,
1975.
R Development Core Team. R: A language and environment for statistical computing. R Foun-
dation for Statistical Computing, Vienna, Austria, 2004. URL http://www.R-project.org.
ISBN 3-900051-07-0.
A. Ramos and A. W. Ledford. A new class of models for bivariate joint tails. Journal of the
Royal Statistical Society, Series B, 2008. To appear.
M. Rebetez. Seasonal relationship between temperature, precipitation and snow cover in a
mountainous region. Theoretical and Applied Climatology, 54(99106), 1996.
S. I. Resnick. Extreme Values, Regular Variation and Point Processes. Springer-Verlag, 1987.
S. Richardson and P. J. Green. On Bayesian analysis of mixtures with unknown number of
components (with discussion). Journal of the Royal Statistical Society, Series B, 59:731792,
1997.
M. E. Robinson and J. A. Tawn. Extremal analysis of processes observed at dierent frequencies.
Journal of the Royal Statistical Society, Series B, 62:117135, 2000.
H. Rootzen. Extreme value theory for moving average processes. Annals of Probability, 14:
612652, 1986.
M. Schlather and J. A. Tawn. A dependence measure for multivariate and spatial extreme
values: Properties and inference. Biometrika, 90(1):139156, 2003.
J. Segers. Extreme events: dealing with dependence. Technical Report 2002-036, EURANDOM,
2002.
S. I. Seneviratne, D. L uthi, M. Litschi, and C. Schar. Land-atmosphere coupling and climate
change in Europe. Nature, 443(14):205209, 2006.
M. Sibuya. Bivariate extreme statistics. Ann. Inst. Statist. Math., 11:195210, 1960.
164
BIBLIOGRAPHY
R. L. Smith. Maximum likelihood estimation in a class of non-regular cases. Biometrika, 72:
6790, 1985.
R. L. Smith. The extremal index for a Markov chain. Journal of Applied Probability, 29:3745,
1992.
R. L. Smith and I. Weissman. Estimating the extremal index. Journal of the Royal Statistical
Society, Series B, 56:515528, 1994.
R. L. Smith and I. Weissman. Characterization and estimation of the multivariate extremal
index. Unpublished, 1996.
M. S uveges. Likelihood estimation of the extremal index. Extremes, 10:4155, 2007.
M. S uveges, M. Rebetez, and A. C. Davison. Nonstationarity of summer temperature extremes
and the role of air humidity. Submitted to the International Journal of Climatology, 2008.
M. A. Tanner. Tools for Statistical Inference Methods for Exploration of Posterior Distributions
and Likelihood Functions. Springer-Verlag: New York, 1996.
J. A. Tawn. Bivariate extreme value theory: models and estimation. Biometrika, 77:245253,
1988.
R. von Mises. La distribution de la plus grande de n valeurs. In Selected papers, volume II,
pages 271294. Providence, R.I.: American Mathematical Society, 1954.
I. Weissman and S. Y. Novak. On blocks and runs estimators of the extremal index. Journal of
Statistical Planning and Inference, 66:281288, 1998.
H. White. Maximum likelihood estimation of misspecied models. Econometrica, 50(1):125,
1982.
H. White. Estimation, Inference and Specication Analysis. Cambridge University Press, 1994.
Z. Zhang. The estimation of M4 processes with geometric moving patterns. Annals of the
Institute of Statistical Mathematics, 60:121150, 2008. doi:10.1007/s10463-006-0078-0.
Z. Zhang. Multivariate extremes, max-stable process estimation and dynamic nancial modeling.
PhD thesis, University of North Carolina at Chapel Hill, 2002.
165
BIBLIOGRAPHY
Z. Zhang and R. L. Smith. The behavior of multivariate maxima of moving maxima processes.
Unpublished, 2004.
166
Curriculum vitae
Personal information
Date and place of birth: April 20, 1967, Budapest (Hungary)
Nationality: Hungarian
Marital status: married, 3 children (born in 1994, 1998 and 2000)
Education and experience
2003-2008 PhD at

Ecole Polytechnique Federale de Lausanne, Institute of Mathematics
2003 Diplome postgrade en statistique, Universite de Neuchatel
Title: Information matrix in the case of random explanatory variables
Advisor: Prof. Gerard Antille (University of Geneva)
2001-2003 Postgrade in Statistics, Universite de Neuchatel
1997-2003 Following my husband and staying home with the family (France, Switzer-
land)
1996-1997 MultiRacio Ltd, Budapest
Development of statistical software for the estimation of unemployment rates
1993-1996 MTA-KFKI Research Institute for Particle and Nuclear Physics, Budapest
1993 Master Degree in Physics/Astrophysics
Title: Static axisymmetric ellipsoidal vacuum spacetimes
Supervisor: Dr. Istvan Racz
1985-1993 Roland Eotvos University, Faculty of Natural Sciences, Budapest
Studies in geophysics, physics and astronomy

Statistical Analysis of Clusters of Extreme Events

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Statistical Analysis of Clusters of Extreme Events

Transféré par

Droits d'auteur :

Formats disponibles

POUR L'OBTENTION DU GRADE DE DOCTEUR S SCIENCES

accepte sur proposition du jury:

for all > 0 and u

distinct values taken by X

, suppose that F(0) = 0, set the state space to be E = (0, ], (x, ] = x

and x > 0. Then N

, set the state space of the process to be E = (, 0], (x, ] = (x)

. The probability of having no

as n for some (0, ) and

with unit Frechet margins.

as n for some (0, ) and

) is nonsingular, the restriction of Theorem

which minimizes the Kullback

the parameter value in the log-likelihood specied by

possibly falling very far from

); this, too, will be obviously process-dependent.

()| and |D()

above a high threshold u, which corresponds to

) (left panels) and the square

) (left) as a function of u and K, for the complete series of the Arosa-Bern

) as a function of u and K, for the complete series of the Arosa-Bern

t is smaller than the run parameter.

ATEL TEMPERATURE SEQUENCES

ATEL TEMPERATURE SEQUENCES

1900 1940 1980

C temperature span of the curve for xTmax over the humidity

ATEL TEMPERATURE SEQUENCES

ATEL TEMPERATURE SEQUENCES

ATEL TEMPERATURE SEQUENCES

C the expected monthly maximal tem-

C in the expected monthly minimal temperature. By varying

ATEL TEMPERATURE SEQUENCES

of which maximizes the incomplete-

, the rst derivative of Q(

is the solution of the

) is the maximized log-likelihood of the tted mixture of m components at the

Vous aimerez peut-être aussi