Vous êtes sur la page 1sur 15


Ngai Hang Chan and Giovanni Petris
Carnegie Mellon University
Department of Statistics
Pittsburgh, PA 15213-3890, USA
Key Words : Markov chain Monte Carlo; state-space models.

We propose a simulation-based Bayesian approach to the analysis of
long memory stochastic volatility models, stationary and nonstationary. The
main tool used to reduce the likelihood function to a tractable form is an
approximate state-space representation of the model. A data set of stock
market returns is analyzed with the proposed method. The approach taken
here allows a quantitative assessment of the empirical evidence in favor of the
stationarity, or nonstationarity, of the instantaneous volatility of the data.

Empirical analysis of nancial data has by now provided overwhelm-
ing evidence that stock returns cannot be satisfactorily modeled by the linear
ARMA models popularized by Box and Jenkins. One feature that is not com-
patible with a linear model are the observed shifts in the instantaneous vari-
ance, or volatility, of the series. Autoregressive Conditionally Heteroskedastic
(ARCH) models were rst proposed by Engle (1982) and then extended by
Bollerslev (1986) to account for this behavior. In these models the variance
of a time series is assumed to be a predictable process, i.e. a determinis-
tic function of the past. Estimation of the parameters is customarily done
using quasi-maximum-likelihood procedures. Stochastic Volatility (SV) is an
alternative class of models that accounts for volatility clustering. Here the in-
stantaneous variance of the observed series is modeled as a non-observable, or
latent, process. The volatility can be alternatively thought as the state vector,
with its own evolution equation, of a non-linear state-space model. Concep-

tually, this represents an extension with respect to GARCH models, since the
evolution of the volatility is not completely determined by the past observa-
tions, but it includes a stochastic component. On the other hand, one can
argue for a predictable volatility on the ground of an ecient market hypoth-
esis. Several estimation procedures can be adopted to t SV models. Melino
and Turnbull (1990) use a Generalized Method of Moments, which is straight-
forward to implement, but not ecient. Harvey, Ruiz and Shephard (1994)
propose a quasi-maximum-likelihood approach, based on the transformation
of the model into a linear state-space with non-Gaussian observation errors.
A Bayesian approach is taken by Jacquier, Polson and Rossi (1994), while
Kim, Shephard and Chib (1998) suggest a simulation-based exact maximum-
likelihood estimator.
Recently, a number of attempts have been made to extend GARCH
and SV models in order to capture the long-term dependence structure in
the volatility, detected by a number of empirical studies (see for example
Ding, Granger and Engle (1993) and de Lima and Crato (1993)). In par-
ticular, Robinson (1991) introduced the Fractionally Integrated GARCH class
of models which was subsequently studied in Baillie, Bollerslev and Mikkelsen
(1996) by using a quasi-maximum-likelihood estimation method. On the other
hand, Breidt, Crato, and de Lima (1998) extended stochastic volatility to the
Long Memory Stochastic Volatility (LMSV) class of models. The estimation
method they propose is based on the spectral approximation to the Gaussian
In this paper, a novel Bayesian approach based on the MCMC method
is proposed to estimate a LMSV model. By means of the truncated likelihood
method given in Chan and Palma (1998), a LMSV model is expressed as a
linear state space formulation in terms of a dynamic linear model, introduced
in West and Harrison (1997). The dynamic linear model o ers a natural
platform to conduct MCMC Bayesian estimation for LMSV models. Although
the proposed method is conceptually similar to the Bayesian method discussed
in Jacquier et al. (1994), there remains several fundamental di erences. First,
our method o ers a direct parametrization of a LMSV model which was not
covered in Jacquier et al. Second, the dynamic linear model set up allows one
to deal with nonstationarity directly. This may turn out to be an important
property since many of the nancial data exhibit nonstationary behavior in

addition to the long memory phenomenon. Third, the truncated likelihood
method o ers a convenient basis to sample some components of the posterior
directly, although the use of Metropolis-Hastings algorithm is still required
for other components. Together, our approach o ers an ecient method in
estimating a LMSV model.
The paper is organized as follows. The LMSV model and its state space
formulation, together with a description of the MCMC sampling scheme are
given in Section 2, while an application of the proposed method to a real data
set is illustrated in Section 3. The paper concludes in Section 4.


The basic setup is the stochastic volatility model:
yt = t t
t =  exp(vt=2);
where yt is observed at time t, (t)t2Z is a sequence of independent standard
Normal random variables,  is a positive constant and the sequence (vt)t1
satis es the ARFIMA relation
(1 B )d (B )vt = (B )t ; (2)
where d 2 ( 0:5; 0:5), (t)t2Z is Gaussian white noise with variance  , ()
and () are polynomials of order p, q, respectively, with all their roots outside
the unit circle and with no common root, and B is the backshift operator.
Equation (2) implies that (vt ) has a moving average representation in terms
of the white noise (t ). One can truncate the in nite moving average to a
nite number of terms M , say, to obtain an approximate representation of (vt ).
Chan and Palma (1998) prove that a better approximation can be obtained by
considering the corresponding truncation of the moving average representation

of the rst di erence of (vt):
(1 B )d 1 (B )vt = (B )t (3)

vt = (1 B ) d+1 (B ) 1(B )t
= 'j Bj t
j =0
 'j B j  t
j =0
= 'j  t j :
j =0

Note that the coecients 'j of the truncated moving average representation
4 4
of (vt ) are functions of d,  = (1; : : : ; p) and  = (1 ; : : : ; q ) (to achieve
identi cation, we assume 0 = 0 = 1). The choice of the truncation parameter
M is discussed in Chan and Palma (1998). In practice, we found that a value
of M between 10 and 20 provides a suciently accurate approximation.
4 4
Let xt = log yt2 and ut = log t2. Then (1) implies the following:
xt = vt + ut + log 2; (4)
and, di erencing,
xt = vt + ut: (5)
We have therefore the following approximate model:
xt = vt + "t
< M
vt = 'j t j (6)
> j =0
"t= ut ut 1:
This can be conveniently represented in terms of a Dynamic Linear Model
(DLM), as we are going to describe next. Since (vt ) and ("t) are independent

moving average processes, each one has a state-space representation:
8 2 3
> " #
> 6 7
> 0 0 607
< Xt+1 =
IM 0 Xt + 64 ... 75 t+1
> 6 7

> h i
: v =
t '0 : : : 'M Xt
Zt+1 = ut
"t = Zt + ut
Here, Xt = (Xt;1; : : : ; Xt;M +1)0 is a vector of dimension M + 1 and Zt is a
vector of dimension 1, i.e., a real number.h The two i0
DLMs (7a) and (7b) can
be easily combined. To this end, let t = Zt Xt0 . Then
8 2 3
> 2 3
> 6 7
> 0 0 6t+1 7
> 6 7
< t+1
> = 64 7
5 t + 6 0
0 0 0
6 7
6 .. 7 (8)
IM 0 4 . 5
> 0
> h i
: xt = 1 '0 : : : 'M t + ut
The DLM is completely speci ed once a distribution for the state vector at
time 1 is given. If we imagine that the dynamic of the system can be extended
into the past, the components of 1 have the following interpretation:
Z1 = u0 ;
X1;j = 2 j ; j = 1; : : : ; M + 1:
Guided by this interpretation, we assign 1 a Normal distribution with mean
(1:27; 0; : : : ; 0)0 and variance diag(2=2; ; : : : ;  ). Note that 1:27 and 2=2
are the mean and variance of the log 21 distribution. It would be very conve-
nient to work with a Gaussian DLM. In order to do so, we approximate the
distribution of ut (t  1), which is the log 21 distribution, to a convenient
accuracy with a nite mixture of normal distributions. Denoting by L(X ) the
distribution of X , for any random element X , we can write
L(ut)  j N (mj ; j2); (10)
j =1

where N (m; 2) denotes a Gaussian distribution with mean m and variance 2 ,
and the j 's are positive weights adding to one. Note that, for a given N , the
computation of j , mj , j2, using nonlinear optimization techniques, is done
once and for all before starting the simulations. An excellent approximation
can be obtained with N as low as ve. For further details on the computations
see Shephard (1994), Chan and Petris (1999) and references therein. One can
think of ut , to the degree of accuracy implied by the approximation (10), as a
random variable that can be generated in two steps: rst, pick at random one
integer j between 1 and N according to the probability distribution de ned by
1 ; : : : ; N ; then, draw a number from a N (mj ; j2) distribution. This consid-
eration suggests that we add to our model a vector of T discrete independent
\latent" variables

K = K1; : : : ; KT ; (11)
with distribution de ned by
P (Kt = j ) = j ; j = 1; : : : ; N; t = 1; : : : ; T: (12)
The link between this latent vector and our model is set by putting
Z x  2
P (ut  xjK ) = p 1 exp (  m j)
d if Kt = j; (13)
j 2 1 2j2
j = 1; : : : ; N , t = 1; : : : ; T . Up to the approximation (10), the marginal distri-
bution of the sequence (ut) has not changed; on the other hand, conditionally
on K , the DLM (8) is Gaussian.
While the distribution of K is dictated by the model, we have more
freedom in choosing a prior distribution for the parameters  , d, , . Although
di erent choices are possible, we consider a noninformative prior on d, , 
and a conjugate prior on  . More speci cally, the distribution of d, ,  is
taken to be uniform in its domain of de nition, while the distribution of  is
an inverse gamma with parameters  ,  , having density

p ( ) = ( )   exp  ;  > 0:


Furthermore,  is assumed to be independent of d,  and .
Let us denote by  the parameter of the model, including the latent
variables that we have introduced, i.e.
= ( t ); K; ; d; ;  : (15)

To analyze the posterior distribution we need to generate a sample from the
distribution of , conditionally on the observed sequence (xt )Tt=1 . Loosely
speaking, each of the six components of  (which may itself be multidimen-
sional, such as ( t), for example) is sampled from its full conditional distribu-
tion, i.e., its conditional distribution given the data and the other parameters.
Since, given all the other parameters and latent variables, the model
reduces to the DLM (8), sampling from the full conditional distribution of ( t)
is equivalent to sampling from the posterior distribution of the state vectors
at time t = 1; : : : ; T in a completely speci ed Gaussian DLM. This can be
done eciently using the forward ltering, backward sampling approach of
Fruhwirth-Schnatter (1994).
Sampling from K is straightforward, when one realizes that the compo-
nent of K are, under the full conditional distribution, independent and have a
nite support. Also straightforward is sampling  : since we choose a conjugate
prior, its full conditional distribution is again an inverse gamma.
The full conditional densities of the remaining parameters do not have
an analytic form which can be recognized as corresponding to any known and
well-studied distribution. Therefore to draw from these distributions we use
Metropolis-Hastings algorithm (see Tierney (1994)). For each one-dimensional
full conditional, the proposal distribution we use is based on a linear approxi-
mation of the logarithm of the target density, as described in Gilks and Wild
(1992). More details about the MCMC simulation can be found in Chan and
Petris (1999).

We apply the model and estimation technique described in the previous
section to a nancial time series. The data consists in the daily returns for the
value-weighted market index from the Center for Research in Security Prices
from July 1962 to July 1989. Following a common practice, the correlation
in the return data due to the day of the week and month of the year was
removed using standard lters. The series of the returns, analyzed also in
Breidt et al. (1998), is plotted in Figure 1. Figure 2 contains the log squares
of the returns. There seems to be an increasing trend in the series, suggesting
strong persistence or even nonstationarity. Fitting a straight line to the data
by ordinary least squares gives a t-value of 15 for the slope parameter. Since
the standard normality assumptions clearly do not hold here, it is dicult to
interpret this number, for example attaching a p-value to it. However, we are
inclined to judge it a 'big' number, though informally, prompting the use of a
model that allows, but does not assume, a nonstationary behavior.
We model the return data as
yt = t t;
t =  exp(vt =2); (16)
(1 B )d(1 1B )vt = t ;
using the prior described in the previous section. The value of M was xed
to 20. Larger values gave essentially the same results. For the mixture of
normals approximation (10) we used the values reported in Table 4 of Kim et
al. (1998).
Posterior summaries (the mean and four quantiles) for selected param-
eters resulting from the MCMC simulation are reported in Table I.
0.05 0.25 Mean 0.75 0.95
d 0:555 0:642 0:675 0:717 0:722
1 0:589 0:590 0:595 0:596 0:602
 0:000129 0:000229 0:002655 0:005216 0:077035
The posterior distribution of d con rms our feeling about the nonstationarity
of the volatility process.
One advantage of the Bayesian approach is that a full posterior dis-
tribution is available, so that inference on events or quantities depending on
the parameters is conceptually straightforward. For example, one issue with
this kind of data is whether the process driving the volatility is stationary or
not. This formally corresponds to test the hypothesis that d is less than 0:5.
For this data set, the posterior probability that the process is nonstationary
(d  0:5), evaluated from the Monte Carlo sample, turns out to be 99%. Note
that the prior we use is noninformative with respect to this issue, in the sense
that P ( 0:5 < d < 0:5) = P (0:5  d < 1:5) = 1=2.
Breidt et al. (1998), using an approach based on the spectral likelihood,
nd estimates ^1 = 0:932 and d^ = 0:444. It should be pointed out, in order
to explain the discrepancy between those estimates and the corresponding
posterior means reported in Table 1, that we are using a di erent model. In
fact, even if the set of equations describing the observation process and the
evolution of the volatility is the same, and is expressed in terms of the same
parameters, our parameter space is di erent. While we allow the long memory
parameter d to vary in ( 0:5; 1:5), Breidt et al. constrain this parameter to
the stationarity region ( 0:5; 0:5). In other words, the model considered here
includes as a proper submodel the one considered in Breidt et al. Note that,
in the light of our results, the latter has a very low posterior probability, which
indicates that it provides a poor t for the data. This in turn implies that the
extended, nonstationary model, is much more appropriate for this data set.

Daily returns on stocks or indexes typically show a nonlinear behav-
ior. Several models have been proposed to describe this kind of data, usually
assuming the unobservable volatility of the returns to follow either a station-
ary process (e.g., GARCH, SV), or a nonstationary one (e.g., IGARCH). The
present paper introduces a model which transcends the dichotomy, encompass-
ing stationarity and nonstationarity, as well as long-range dependence, a fea-
ture frequently observed in daily nancial time series. The Bayesian approach
taken here allows one, by combining a noninformative prior with the evidence
provided by the data through the likelihood function, to obtain a readily in-
terpretable posterior probability of the volatility process being stationary. In
the example considered in Section 3, the evidence against stationarity is fairly
strong. Note that it is not uncommon, when analyzing daily returns using a
stationary GARCH or SV model, to obtain estimates of the parameters close
to the boundary of the stationarity region.
A stylized fact about daily returns that we have not considered here,
but can be easily incorporated in the model, is the excess kurtosis in the
returns. This can be accommodated by assuming a fat-tailed distribution
(e.g., a Student's t) for t in equation (1) instead of a normal distribution.
The analysis would then carry out in essentially the same way, with the only
di erence being a change in the mixture of normals (10). Several other topics
remain open for future research, including the forecast of future volatility and
the use of the model for option pricing.

The authors would like to thank Dr. Jay Breidt for kindly providing
the data set analyzed in Section 3. We would like to thank a referee and the
Guest Editor for helpful comments. This research is supported in part by an
Earmarked Grant No. HKUST6082/98T from the Research Grant Council of
Hong Kong and by a National Science Foundation Group Infrastructure Grant
to the Department of Statistics at Carnegie Mellon University.

Baillie, R.T. and Bollerslev, T. and Mikkelsen, H.O. (1996). Fractionally inte-
grated generalized autoregressive conditional heteroskedasticity. Journal
of Econometrics 74, 3-30.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedas-
ticity. Journal of Econometrics 31, 307-327.
Breidt, F.J. and Crato, N. and de Lima, P. (1998). The detection and esti-
mation of long memory in stochastic volatility. Journal of Econometrics
83, 325-348.
Chan, N.H. and Palma, W. (1998). State space modeling of long-memory
processes. Annals of Statistics 26, 719-740.
Chan, N.H. and Petris, G. (1999). Bayesian analysis of long memory stochas-
tic volatility models. Technical report. Department of Statistics, Carnegie
Mellon University.
de Lima, P.J.F. and Crato, N. (1993). Long-range dependence in the condi-
tional variance of stock returns. Proceedings of the Business and Eco-
nomic Statistics Section, Joint Statistical Meetings, San Francisco.
Ding, Z. and Granger, C. and Engle, R.F. (1993). A long memory property
of stock market returns and a new model. Journal of Empirical Finance
1, 83-106.
Engle, R. (1982). Autoregressive conditional heteroskedasticity with esti-
mates of the variance of UK in ation. Econometrica 50, 987-1008.
Fruhwirth-Schnatter, S. (1994). Data augmentation and dynamic linear mod-
els. Journal of Time Series Analysis 15, 183-202.

Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs
sampling. Applied Statistics 41, 337-348.
Harvey, A. and Ruiz, E. and Shephard, N. (1994). Multivariate stochastic
variance models. Review of Economic Studies 61, 247-264.
Jacquier, E. and Polson, N. and Rossi, P. (1994). Bayesian analysis of stochas-
tic volatility models. Journal of Business and Economic Statistics 12,
Kim, S. and Shephard, N. and Chib, S. (1998). Stochastic volatility: likeli-
hood inference and comparison with ARCH models. Review of Economic
Studies 65, 361-393.
Melino, A. and Turnbull, S. (1990). Pricing foreign currency options with
stochastic volatility. Journal of Econometrics 45, 239-265.
Robinson, P. (1991). Testing for strong serial correlation and dynamics condi-
tional heteroskedasticity in multiple regression. Journal of Econometrics
47, 67-84.
Shephard, N. (1994). Partial non-Gaussian state-space. Biometrika 81, 115{
Tierney, L. (1994). Markov chains for exploring posterior distributions. The
Annals of Statistics 22, 1701{1762.
West, M. and Harrison, J. (1997). Bayesian forecasting and dynamic models,
2nd Ed. Springer-Verlag, New York.

FIG. 1. Daily Returns.

0 1000 2000 3000 4000 5000 6000

FIG. 2. Log squared returns.

0 1000 2000 3000 4000 5000 6000