Académique Documents
Professionnel Documents
Culture Documents
2. Frequency Domain
Look at total variability of the time series.
Decompose variance var xt into components of different frequencies (or periods); i.e. if
xt = Asin 2 t B cos 2 t where A , B are random variables with var A = var B = 2 and
2
2
2
2
2
cov A , B =0 , then var x t =sin 2 t cos 2 t = .
Graphical Methods
Preliminary goal: Describe the overall structure of data; also want to identify memory type of the time series.
1. Short-Memory
Immediate past may give some information about immediate future but less information about long-term
future.
2. Long-Memory
Past gives more information about long-term future.
Examples: series with polynomial trends or cycles/seasonality.
1 of 17
1 wp p
Bernoulli trials: xt ={1 wp 1 p .
2. Random Walk
Start with w t ~iid .
Set x0 =0 . Let xt =w1 w 2 wt by cumulatively summing (or integrating) the w t 's.
It can be expressed as xt =xt1w t .
Can also add a drift xt = x t1 wt .
3. Linear Trend
xt =a0 a 1 t wt where w t ~iid .
a 0 and a1 are constants that can be estimated through least squares.
Plot residuals e t = xt a0 a1 t . The should look like the w t 's.
4. Seasonal Model
Example: series affected by the weather.
May need to include a periodic component with some fixed known period T .
1
xt =a0 a1 cos 2 ta 2 sin 2 tw t where w t ~iid . = is a fixed frequency.
T
2
2
5. w t ~WN 0, w , i.e. E w t =0 , var w t = w , w t 's uncorrelated.
6. First-Order Moving Average or MA(1) Process
2
xt =wt w t 1 , t =0, 1, , with w t ~WN 0, w .
2
2
Then E xt =0 and var x t = w 1 .
7. First-Order Autoregressive or AR(1) Process
2
xt = x t 1 w t , t=0,1, , with w t ~WN 0, w .
General Modelling Approach
1. Plot the series. Examine the graph for:
a trend,
a seasonal component,
obvious change point,
outliers.
2. Remove trend and seasonal components; this should provide stationary residuals. Two main techniques:
Estimate the trend and/or seasonal components and subtract them from the data.
Differencing the data. Replace xt by yt = x t xt 1 , or in general yt = x t xtd for some positive integer d .
3. Fit a model to the stationary residuals.
4. Forecast the residuals, then invert to forecast xt .
5. Alternatively, use a frequency domain (or spectral) approach. Express the series in terns if sinusoidal waves of different
frequencies, i.e. Fourier components.
1
Special case: If xt ~iid N 0,1 , then F c1, , cn = ct where x =
t=1
2 of 17
dz .
F t x
where F t x = P x t x .
x
cov x s , x t
s , t
=
, s ,t .
s , s t ,t var x s var x t
Remark
If xt =0 1 x s , then 10 s , t =1 and 10 s , t =1 .
{ xt }
is given by xt =
t h , t
h
=
.
t h ,t h t , t 0 0
=
j
, where w t ~ wn 0,
Remark
3 of 17
and
j .
j=
2
One can show for xt that h=
j=
j jh .
ESTIMATION OF AUTORRELATION
Assumptions
x1, , xn are data points (one realization) of a stationary process.
Definition: Sample Mean
1
x=
The sample mean is defined by
n
x .
=
t
t 1
nh
1
x x xt x , h=0,1,, n1 .
n t =1 th
h
.
0
, then for large n , h , h=1, 2,, H is approximately normal with mean 0 and variance
1
, i.e.
n
0, 1 .
h AN
n
x t =1 B xt = x t x t 1 which
Note
xt is an example of a linear filter. Differencing plays a central role in ARIMA modelling.
4 of 17
=
k
f t = w t i xt where w t i
i= 1
k z =
z2
1
e
2
j= 1
ARIMA Models
AUTOREGRESSIVE MOVING AVERAGE (ARMA) MODELS
Previously examined a classical regression approach, which is not sufficient for all time series encountered in practice.
Definition: Autoregression Model
An autoregression model of order p denoted by AR(p) is given by xt =1 x t 1 p xt p w t , where
xt is stationary, 1, , p are constants ( q0 ), w t ~ wn 0, 2 .
Note
E xt = .
Also, xt = 1 x t1 p x t pw t where = 1 1 p .
Definition: Autoregressive Operator
p
B=1 BB is known as the autoregressive operator.
Note
p
The AR(p) model can be written as 1 1 B p B xt =wt or B xt = wt .
Example: AR(1) Process
xt = x t 1w t where t =0,1, 2, and w t ~ wn 0, 2 .
k 1
k
j
Iterating k times yields xt = x tk w t j .
j=0
j =0
j=0
k1
j=0
]
2
xt w t j
xt = j w t j and E xt = j E w t j =0 .
5 of 17
1
h
h
, h0 , and h=
= , h0 .
0
1 2
If 0 , smooth path; if 0 , choppy path.
h=cov x t h , x t =
12 h=0
h=1
Then h={ 2 h =1
and h={ 1
.
0 h1
0 h 1
Note that for MA(1) process, xt is only correlated with xt 1 .
Definition: ARMA(p, q)
A time series { xt } is ARMA(p, q) if it is stationary and xt = 1 x t 1 p x t p wt 1 w t 1 q w t q , where
2
w t ~iid N 0, and = 1 1 p .
Note: We can write B xt = B wt .
Definition: Casual
An ARMA(p, q) model B xt = B wt is said to be causal if
w = B w , where
xt =
B= j B j and
j =0
j .
j=0
j
We have xt = w t j . So AR(1) with 1 is causal. Equivalently, the process is causal only when the root of
j =0
E xt =
E w = 0 .
=
j
[
q
h=cov x t h , x t = E
j =0
j wt h j
j =0
j wt j
qh
2
= { j jh 0 hq
j=0
0 hq
6 of 17
q h
h
h=
=
0
j jh
j=0
q
0 hq
2
j
j=0
0 hq
1
1
h1 , h1 and h=
1 h 1
, h1 .
2
12
][ ] [ ]
p1 1
1
T
, and 2 =0
.
= or =
p
p 1
0
p
2
.
Equations used to determine 0 ,, p from and
h
,
h=0,,
p
However, we can substitute
into equations to estimate 2 and
.
Minimize
t= p1
2
Examples of f : f x= x (least squares), f x=x .
Maximum Likelihood
2
Assume w t ~iid N 0, .
Use x1, , xn to obtain likelihood function joint density of x1, , xn which is a function of unknown parameters.
T=n
j , where
=
2
j 1
h ~ AN 0,
1
or
n
2
n h N 0,1 . Then T is approximately h .
Define 0 =1 , h =h h , h1 where =
or
h1 0
h
h h
Note: This is the correlation between xt and x th with dependence on x t1 , , x th 1 removed.
[ ][
][ ]
, h
hh
hh
h
h
h
n
Lemma
If xt is an AR(p) process, then h =0 for all h p .
Procedure
Choose p such that
p1 , p2 , are close to zero (statistically zero).
Akaike Information Criterion (AIC)
Compare different AR models.
Based on likelihood function.
How to compare? Look at estimates 2 of var w t for each model.
2
1 1
p
p .
Estimate 2p for each AR(p) model by Yule-Walker: p= 0
2
p 2 p .
Define AIC p=n ln
Choose model order that minimized AIC p .
Should use AIC together with graphical methods.
AIC tends to choose higher order models. If wo models have similar AIC values, choose lower order model (parsimony).
2
B xt = B wt where w t ~iid N 0, . Then xt is a Gaussian process.
Given parameters 1, , p , 1, , p , and 2 , one can determine the joint density f x1, , x n .
Given parameters data , x1, , x n one can determine the parameter values that maximizes f x1, , x n .
We have
or
=
.
=
n1
0
n
n
][ ] [ ]
8 of 17
ARMA MODELLING
When do we fit an ARMA(p, q) model as opposed to a pure AR(p) or MA(q) model?
Sample correlations are the key.
Data appears to be stationary with no obvious trends (polynomial or seasonal) and the sample ACF h decays fairly
rapidly (geometrically).
However, the sample ACF and PACF do not cut-off after some small lag, i.e. h and h
are not small for
moderate h .
Diagnostics
Observations: x1, , xn .
, , and 2 .
Fit ARMA(p, q) model, i.e choose model order and obtain the MLEs of parameters
,
x t xt
,
r t 1
E x t x t
, t =1, , n where r t 1 =
= P t 1 .
2
ARIMA MODELS
Definition
Let d =1,2, be nonnegative integers.
process.
stationary if and only if z 0 z=1 , i.e. z has no unit root. If z 0 z1 , then xt is causal and
xt =
(MA()).
Note
d
*
ARIMA(p, d, q): B 1 B x t = B wt , or equivalently, B xt = B wt . * z has a root (of order d ) at
z =1 since * z = z 1z d .
Very specific type of non-stationarity.
Useful for slowly decaying sample ACF's where the data has a polynomial trend.
Unit Root Test
Can test the hypothesis H 0 : =1 in AR(1) process (non-stationary random walk) vs H 1 :1 (stationary).
Unit roots in z are very common in economic and financial time series.
If we reject H 0 , then differencing is not necessary.
Tests can be extended to AR(p) processes: H 0 : 1 p=1 .
In summary, if unit root in z , data should be differenced; if unit root in z , data over differenced.
Strategy A
1. Difference data so it appears stationary.
2. Assume P , Q =0 ; choose p and q using AIC like in ARMA case.
3. Carry out diagnostics, i.e. white noise checks.
Strategy B
1. Difference data so it appears stationary.
2. Assume p=P=0 ; choose q and Q by looking at sample ACF 1 ,
2 , and s , 2 s , , and
choose q and Q to be cut off points.
3. Carry out diagnostics, i.e. white noise checks.
amplitudes: xt =
k1
2
variables with E U k =E U k =0 and var U k = var U k =k
1
Notation
is frequency; cycles per time.
1
.
Example
xt = Acos 2 t where A and are independent random variables. We can rewrite xt in regression form as
xt =U 1 cos2 tU 2 sin 2 t where U 1 = A cos and U 2= A sin .
PERIODOGRAM
Identify possible periodicities or cycles in the data.
Definition: Fundamental Fourier Frequencies
n 1
k
k n . Let F n= wk = k k = n1 , , n
Let w k = where k and
n
2
2
n
2
2
fundamental Fourier frequencies.
Note: denotes the greatest integer function.
1 1
Note: F n ,
.
2 2
{ }
11 of 17
be the set of
i 2
1 e
n n i2
e
k
Let ek =
, k =
n
. It can be shown that {e1, , en } is an orthonormal basis for
2
n
. Any
1
2 i t
xt e
vector x can be written as x =k = a k ek where a k =
.
n t =1
{ak } is called the discrete Fourier transform of the sequence { x1, , x n } (think of {xt } as the sample).
n
2
n 1
2
n
2
n
2
n
2
2 i k t
= k = a k cos2 k t i sin 2 k t ,
Note: From x =k = a k ek , the t -th component is xt = k= a k e
iw
iw
i w
i w
e e
e e
or equivalently, xt = j =1 b j cos 2 nj tc j sin 2 nj t since sin w=
and cos w=
. So
2
2i
xt , t =1, , n can be expressed as an exact linear combination of sine waves with frequencies k F n .
n 1
2
n 1
2
n 1
2
n1
2
Definition: Periodogram
Define I n =
I n k =
1
n
1
n
2 i t
xt e
t =1
2 i k t
xt e
t =1
=a k =
1
n
xt cos2 k t
t =1
1
n
x t sin 2 k t
t =1
Note
This measures the dependence (correlation) of the data x1, , xn on sinusoides oscillating at a given frequency k .
If data contains strong periodic behavior corresponding to a frequency k , then I n k will be large.
Note
n1
k
he2 i h where
is a Fourier frequency, then I n k =
h
n
h=n1
k
SPECTRAL DENSITIES
Definition: Spectral Density
Suppose xt is a zero-mean stationary time series with an absolutely summable ACV function , i.e.
h e
2 ih
n=
[ ]
1 1
,
.
2 2
Note
Compare f with I n ; periodogram is used to estimate the spectral density.
Also, f = f
1/ 2
1 /2
12 of 17
1 /2
2 i h
f d =
cos 2 h f d .
1 / 2
1
2
[ 12 , 12 ]
such that h=
2 ih
d F
h .
F 12 =0 , F 12 =0 =var x t .
F is called the spectral distribution function. F is also called a generalized distribution function because
F
G w
is a probability function.
F 1/ 2
1
2
d F
h , then F is differentiable with
2. If
= f . In this case, h= e2 i h f d where
d
h=
1
SPECTRAL REPRESENTATION OF X
Theorem
Suppose xt is a zero-mean stationary process with ACV and spectral distribution function F . Then there exist two
1
1
continuous real-valued processes C and S , 2 2 , with
cov C 1 , S 2 , 1, 2
var C 2C 1=var S 2 S 1 =F 2 F 1 for 1 2 ,
cov C C , C =cov SS , S =0 for 0 ,
such that
1
2
1
2
xt = cos2 t d C sin 2 t d S
1
2
1
2
cos
n
2
=lim
n
j=
n
2
2 j t
n
[ ]
C
j
j1
C
lim
n
n
n
sin
n
2
j =
n
2
2 j t
n
[ ]
S
j
j 1
S
n
n
1
2
[ 12 , 12 ]
having
k
k1
C
n
n
k
k 1
S
; they are mutually uncorrelated and that cov A j , A k =cov B j , B k =0 jk and
n
n
k
k 1
k
1
d F
cov A j , B k =0 j , k . Also, var A k =var B k = F
F
f
since f =
. Then we have
n
n
n
n
d
and B k = S
13 of 17
n
2
Reimann sums xt =
cos
j = n
2
cos
n
2
var x t =
j= n
2
2 k t
2 k t
var A k
n
B sin
n
2
j = n
2
sin
n
2
j= n
2
2 k t
, and hence
n
1
2
2 k t
var B k f d . So the spectral density f provides the
n
1
FILTERING
Theorem: Filtering Theorem
j xt j where
j =
yt =
j . Then
j=
2 i 2
f x = e2 i e2 i f x where e2 i = j e2 i j . is called the transfer
y = e
j=
cu xt u where c u0 and
u =r
2. Differencing yt = xt = x t x t 1 =
u=r
u=
GARCH Modelling
ARCH MODELLING
Autoregression conditional heteroscedastic.
Model time-varying second moments, i.e. variances.
Very important for modelling financial time series.
14 of 17
Note
If t = t (constant), then yt = t . This is equivalent to ln P t following a simple random walk
ln P t = ln P t1 t .
2
2
Also, in this case where t = t , yt ~iid N 0, . This is not the case for ARCH(1) where t = 01 y t1 .
Note
2
E y t y t 1= t E ty t 1 =0 since E t = 0 , and that var yt y t 1 =t var t y t 1= 01 y t1 since var t =1 .
2
Thus yt | y t1~ N 0, t (conditional distribution).
y t = T T
2
2
2
2
2
2
2
2
2 , we get yt =0 1 yt 1 T t 1 . Here t 1~1 with E t 1= 0 . Let u t = T t 1 .
0 1 yt 1=t
2
2
Then E ut y s , st1=0 and hence E ut = E E ut y s , st 1=0 . Therefore yt =0 1 yt 1ut is a nonGaussion AR(1) process.
0
2
2
j 2 2
2
By recursively substituting, we can write yt =0 1 t t 1 t j . Therefore E y t =
if 11 . Also,
1
j=0
1
From
2
.
yt =t 0 1 1j 2t1t
j
j=1
4
It can be shown that E y t =
11
3 0
1 1
0
. Also,
11
131
2
if 31 1 . Therefore the kurtosis K of yt is
E yt
11
K=
=3
3 .
2 2
E y t
1321
Hence the solution yt of the ARCH(1) process is strictly stationary white noise. However, yt is not iid noise and is not
Gaussian.
Remark
2
2
With ARCH(1), yt is an AR(1) process with heteroscedastic non-Gaussian noise. Thus yt is serially correlated; this
should show up on a time series plot and sample ACF for y2t .
0
2
=var yt . This is the expected variance or long term average variance.
Also note that E t =
1 1
Remark
' z t yt where yt ~ ARCH 1 .
1. Regression with ARCH errors: x t =
2. AR( p ) with ARCH errors: xt = 1 x t 1 p xt p y t where yt ~ ARCH 1 .
2
2
3. ARCH(1) extends to ARCH( p ): yt = t t where t ~iid N 0, 1 and t = 01 y t1 1 yt p .
15 of 17
GARCH MODELLING
Definition: GARCH(1, 1) Model
yt = t t where t ~iid N 0, 1 and t = 01 y 2t 1 1 2t1 , 0 0, 10, 10, 1 11 .
Note
2
2
2
2
2
2
2
t =0 1 1 t 1 1 t 1 t 1 1 . Here t 11 ~ 1 . Let ut = t t 1 . Then
2
2
2
yt =01 1 yt 1ut 1 u t . Hence yt is a non-Gaussian ARMA(1, 1) process with heteroscedastic noise.
Note
One can forecast volatility. Let w =
0
2
2
2
2
. Then t can be expresses as t =w1 yt w 1 t 1 w .
1 1 1
j
[
p xt p,1
xt p,2
wt , 1
, where i 's are 2 2 possibly diagonal matrices.
wt , 2
Continuous-Time Models
Brownian Motion
1. Bt is continuous and B0 =0 .
16 of 17
1, 1 h
2, 1 h
1, 2 h
.
2, 2 h
2.
3.
Bt ~ N 0, t .
Bt s Bt ~ N 0, s , s0 and non-overlapping increments are independent.
Continuous-Time AR(1)
dxt =ba x t dt dB t . This is a stochastic differential equation (SDE).
t
Let I t = e
a t u
cov I t h , I t =
2au
du ,t , h0 .
2
2
b
b
a h
e
,t , h0 , and
, and var x 0=
. Then E xt = and cov xt h , x t =
a
a
2a
2a
hence xt is stationary.
b 2
,
If x0 ~ N
, then xt is a strictly stationary Gaussian process.
a 2a
17 of 17