Académique Documents
Professionnel Documents
Culture Documents
Fall 2011
http://www.stat.ncsu.edu/people/bloomfield/courses/st730/
For example:
Earnings per share of Johnson and Johnson stock (quarterly);
Global temperature anomalies from 1856 1997 (annual);
Investment returns on the New York Stock Exchange (daily).
2
Correlation
Since a wave is described in terms of its period, or alternatively its frequency, methods that measure the waves in a
time series are called frequency domain methods.
7
Statistical Models
The primary objective of time series analysis is to develop mathematical models that provide plausible descriptions for sample data. . .
If the sampling times t1, t2, . . . are equally spaced, their separation t = tn tn1 is the sampling interval and 1/t is the
sampling rate (samples per unit time).
10
100
200
300
400
500
Time
0
1
2
w = ts(rnorm(500));
plot(w);
100
200
300
400
500
Time
Possible model:
vt =
1
wt1 + wt + wt+1
3
Moving Average
w = ts(rnorm(500));
v = filter(w, sides = 2, rep(1, 3) / 3);
plot(v);
100
200
300
400
500
Time
Example: Autoregression
Recursive model:
xt = xt1 0.9xt2 + wt,
t = 1, 2, . . . , 500
10
Autoregression
w = ts(rnorm(500));
v = filter(w, filter = c(1, -0.9), method = "recursive");
plot(v);
100
200
300
400
500
Time
11
Explicitly:
t
xt = t +
wj
j=1
12
Random Walk
20
40
60
80
100
200
300
400
500
Time
13
14
250
200
150
100
50
100
200
300
400
500
Time
15
t = 1, 2, . . . , 500
0
2
4
w = ts(rnorm(500));
x = 2 * cos(2 * pi * time(w) / 50 + 0.6 * pi) + w;
plot(x);
100
200
300
400
500
Time
17
Means
x,t = E(xt) =
xft(x)dx
where the expectation is for the given t, across all the possible
values of xt. Here ft() is the pdf of xt.
1
wt1 + wt + wt+1
3
so
v,t = E (vt) =
1
E wt1 + E (wt) + E wt+1
3
= 0.
100
200
300
400
500
Time
xt = t +
wj
j=1
so
t
x,t = E (xt) = t +
E wj = t,
j=1
20
40
60
80
100
200
300
400
500
Time
so
x,t = E (xt)
= 2 cos(2t/50 + 0.6) + E (wt)
= 2 cos(2t/50 + 0.6),
the (cosine wave) signal.
0
2
4
100
200
300
400
500
Time
Covariances
The autocovariance function is, for all s and t,
x(s, t) = E (xs x,s) xt x,t
2 ), then
If wt is white noise wn(0, w
2 ,
w
w (s, t) = E (wswt) =
0,
s = t,
s = t.
definitely choppy!
gamma
t
s
10
1
wt1 + wt + wt+1
3
and E (vt) = 0, so
v (s, t) = E (vsvt)
1
= E ws1 + ws + ws+1 wt1 + wt + wt+1
9
2,
(3/9)w
s=t
(2/9) 2 ,
s=t1
w
=
2,
(1/9)w
s=t2
0,
otherwise.
11
gamma
t
s
12
xt =
wj
j=1
and E (xt) = 0
so
x(s, t) = E (xsxt)
= E
wj
wj
j=1
2.
= min{s, t}w
j=1
13
gamma
t
s
14
Notes:
For the first two models, x(s, t) depends on s and t only
through |s t|, but for the random walk x(s, t) depends
on s and t separately.
For the first two models, the variance x(t, t) is constant,
2 increases indefibut for the random walk x(t, t) = tw
nitely as t increases.
15
Correlations
(s, t)
(s, s)(t, t)
16
Across Series
x,y (s, t)
17
19
and so on...
20
Simplifications
(t + h, t)
(t + h, t + h)(t, t)
(h)
.
(0)
22
Examples
23
In other statistical applications, means, variances, and covariances are estimated by averaging across samples.
Mean
If xt is stationary, t = E (xt) , so we can estimate by
the sample mean
1 n
x
=
xt .
n t=1
We could also use a weighted mean
n
wtxt,
t=1
where
wt = 1.
t=1
Autocovariance
(h) =
xt+h x
xt x
n t=1
for h = 0, 1, . . . , n 1, with
(h) =
(h).
(h)
.
(0)
3
Sampling Properties
x
is unbiased for .
(h) is not unbiased for (h), but
1 nh
xt+h
n h t=1
xt
Non-negative Definiteness
The covariance matrix of (x1, x2, . . . , xk ) is
k =
(0)
(1)
(1)
(0)
...
...
(k 1) (k 2)
. . . (k 1)
. . . (k 2)
...
...
...
(0)
a k a = var(a1x1 + a2x2 + + ak xk ) 0
for any vector of constants a = (a1, a2, . . . , ak ) .
k is also non-negative
With the above definition of
(h),
definite; that would not be true if we divided by (n h).
5
R Examples
White noise:
acf(ts(rnorm(100)));
SAS Example
Southern Oscillation Index and fish recruitment:
options pagesize = 80;
data soi;
infile soi.dat;
input soi;
run;
data recruit;
infile recruit.dat;
input recruit;
run;
data both;
time +1;
merge soi recruit;
run;
proc gplot data = both;
symbol i = join;
plot (soi recruit) * time;
run;
proc arima data = both;
title SOI and recruitment;
identify var = soi nlag = 50;
identify var = recruit crosscorr = soi nlag = 50;
/* Positive lags indicate SOI leads recruitment. */
run;
12
Replacing recruit with a corresponding recruitSA makes negigible changes to the ACF and CCF.
13
Vector-Valued SeriesNotation
xt =
xt,1
xt,2
...
xt,p
Mean Vector
mean vector:
= E (xt) =
E xt,1
E xt,2
...
E xt,p
1
2
...
p
Autocovariance Matrix
xt+h (xt )
1,1(h) 1,2(h)
2,1(h) 2,2(h)
...
...
p,1(h) p,2(h)
. . . 1,p(h)
. . . 2,p(h)
...
...
. . . p,p(h)
sample mean:
1 n
xt
x=
n t=1
sample autocovariance:
1 nh
(h) =
xt+h
x (xt
x)
n t=1
(h) =
(h) .
for h 0, and
Soil temperatures
10
6
60
40
row
ature
Temper
30
colu 20
mns
20
10
xs+h
xs
Time domain modeling: the inputs often include lagged values of the same series, xt1, xt2, . . . , xtp.
Frequency domain modeling: the inputs include sine and cosine functions.
Fitting a Trend
0.4
0.2
0.0
0.4
1900
1920
1940
1960
1980
2000
Time
possible model:
xt = 1 + 2t + wt,
where the error (noise) is white noise (unlikely!).
fit using ordinary least squares (OLS):
> lmg1900 = lm(g1900 ~ time(g1900)); summary(lmg1900)
Call:
lm(formula = g1900 ~ time(g1900))
Residuals:
Min
1Q
-0.30352 -0.09671
Median
0.01132
3Q
0.08289
Max
0.33519
3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.219e+01 9.032e-01 -13.49
<2e-16 ***
time(g1900) 6.209e-03 4.635e-04
13.40
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 0.1298 on 96 degrees of freedom
Multiple R-Squared: 0.6515,
Adjusted R-squared: 0.6479
F-statistic: 179.5 on 1 and 96 DF, p-value: < 2.2e-16
0.0
0.4
g1900
0.2
0.4
> plot(g1900)
> abline(reg = lmg1900)
1900
1920
1940
1960
1980
2000
Time
and output.
6
Regression Review
the regression model:
xt = 1zt,1 + 2zt,2 + + q zt,q + wt = zt + wt.
fit by minimizing the residual sum of squares
n
xt zt
RSS( ) =
t=1
=
zt zt
t=1
ztxt.
t=1
7
Matrix Formulation
minimized RSS
= x Z
RSS
x Z
Zx
=xx
= x x x Z(Z Z)1Z x
8
Distributions
If the errors are not normally distributed, but still iid, the
same is approximately true.
We want a model that fits well without using too many parameters.
We want small
2 but also small q.
10
n + 2k
n
11
Notes
More commonly (e.g. in SAS output and in Rs AIC function),
these are all multiplied by n.
AIC, AICc, and SIC (also known as SBC and BIC) can be
generalized to other problems where likelihood methods are
used.
If n is large and the true k is small, minimizing BIC picks k
well, but minimizing AIC tends to over-estimate it.
If the true k is large (or infinite), minimizing AIC picks a value
that gives good predictions by trading off bias vs variance.
12
The form of trend might be linear, or higher degree polynomial, or some other function suggested by theory.
2
0.3
0.0
0.3
Residuals
1900
1920
1940
1960
1980
2000
Time
Differencing
t = t +
wj
j=1
xt E ( xt ) =
wj + yt
j=1
0.0
0.3
diff(g1900)
0.3
plot(diff(g1900));
1900
1920
1940
1960
1980
2000
Time
acf(diff(g1900));
0.4
0.2
ACF
1.0
Series diff(g1900)
10
15
Lag
acf(residuals(lmg1900))
0.4
0.2
ACF
1.0
Series residuals(lmg1900)
10
15
Lag
Transformation (Re-expression)
Most commonly logarithms, sometimes square roots (especially with counted data).
Periodic Signals
10
12
14
15
16
Autoregressive Models
Example: AR(1)
Also
xt1 = xt2 + wt1
so
xt = xt2 + wt1 + wt
= 2xt2 + wt1 + wt.
6
Now use
xt2 = xt3 + wt2
so
xt = 2 xt3 + wt2 + wt1 + wt
= 3xt3 + 2wt2 + wt1 + wt.
Continuing:
xt = k xtk +
k1
j wtj .
j=0
We have shown:
xt = k xtk +
k1
j wtj .
j=0
xt =
j wtj ,
j=0
Moments
Mean: E(xt) = 0.
Autocovariances: for h 0
(h) = cov xt+h, xt
= E
j wt+hj
k wtk
j
2 h
w
=
.
2
1
Autocorrelations: for h 0
(h)
= h.
(h) =
(0)
Note that
(h) = (h 1),
h = 1, 2, . . .
10
Simulations
11
Causality
xt =
j wt+j ,
j=1
xt =
j=0
j wtj =
j B j wt
j=0
13
So
(1 B)1 =
j B j .
j=0
Compare with
(1 z)1 =
1
j
=
(z) =
j z j ,
1 z
j=0
j=0
xt =
j wtj
j=0
t = 1, 2, . . .
xt =
j wtj .
j=0
2
w
=
1 2
2
w
But note: in the stationary version, x0 N 0, 12 .
t = 1, 2, . . . ,
18
In operator form:
xt = (B)wt,
where the moving average operator (B) is
(B) = 1 + 1B + 2B 2 + + q B q .
Moments
Mean: E (xt) = 0.
Autocovariances:
(h) = cov xt+h, xt
= E
j wt+hj
k wtk
j
2
= w
k k+h
k
=0
if h > q.
3
(h) = 0
for h > q.
Inversion
Example: MA(1)
xt = wt + wt1 = (1 + B)wt,
so if || < 1,
wt = (1 + B)1xt = (B)xt,
where
(B) =
()j B j .
j=0
xt =
j=1
5
In operator form:
(B)xt = (B)wt.
Parameter redundancy: if (z) and (z) have any common factors, they can be canceled out, so the model is the same as
one with lower orders. We assume no redundancy.
For the MA(1) model, the Autocorrelation Check of Residuals rejects the null hypothesis that the residuals are white
noise.
If the series really had MA(1) structure, the residuals
would be white noise.
So the MA(1) model is not a good fit for this series.
For both the MA(2) and the ARMA(1, 1) models, the ChiSquare statistics are not significant, so these models both
seem satisfactory. ARMA(1, 1) has the better AIC and SBC.
10
Using R
But note that you cannot include the intercept, so the results
are not identical.
Rerun the original analysis with no intercept:
arima(diff(log(varve)), order = c(0, 0, 1),
include.mean = FALSE);
12
= E
j wt+hj
j=0
qh
2
j j+h,
w
=
j=0
k wtk
k=0
0hq
h > q.
1
So the ACF is
qh
j j+h
j=0
,
q
(h) =
2
j=0
0hq
h > q.
Notes:
In these expressions, 0 = 1 for convenience.
(q) = 0 but (h) = 0 for h > q.
MA(q).
This characterizes
So
(h) = cov xt+h, xt
= E
j xt+hj + wt+h xt
j=1
j (h j) + cov wt+h, xt .
=
j=1
So
2
w
cov wt+h, xt =
0
h=0
h > 0.
Hence
p
j (h j),
(h) =
h>0
j=1
and
p
2.
j (j) + w
(0) =
j=1
j (h j),
(h) =
h > q.
j=1
0 = z h
j z (hj) = z h 1
j=1
j z j = z h(z).
j=1
(h) =
cl zlh,
l=1
Example: ARMA(1, 1)
xt = xt1 + wt1 + wt.
The recursion is
(h) = (h 1),
h = 2, 3, . . .
0.2
0.4
0.6
0.8
10
15
20
25
Index
11
12
14
h > p.
15
(B)xt
= (B)wt.
MA(q)
ARMA(p, q)
Tails off
Tails off
PACF
Tails off
Tails off
IACF
Tails off
Tails off
ACF
18
Both produce tables in which the pattern of zero and nonzero values characterize p and q.
run;
Forecasting
General problem: predict xn+m given xn, xn1, . . . , x1.
General solution: the (conditional) distribution of xn+m given
xn, xn1, . . . , x1.
In particular, the conditional mean is the best predictor (i.e.
minimum mean squared error).
Special case: if {xt} is Gaussian, the conditional distribution
is also Gaussian, with a conditional mean that is a linear
function of xn, xn1, . . . , x1 and a conditional variance that
does not depend on xn, xn1, . . . , x1.
1
Linear Forecasting
One-step Prediction
The hard way: suppose
xn
n+1 = n,1 xn + n,2 xn1 + + n,n x1 .
Choose n,1, n,2, . . . , n,n to minimize the mean squared prediction error E
2
n
.
xn+1 xn+1
E.g. AR(p), p n:
xn+1 = 1xn + 2xn1 + + pxn+1p +
first part
wn+1
second part
Now
xn+1 =1xn + 2xn1 + + pxn+1p
+ 1wn + 2wn1 + + q wn+1q
+ wn+1.
Middle part? If the model is invertible, wt is a linear combination of xt, xt1, . . . , so if n is large, we can truncate the
sum at x1, and wn, wn1, . . . , wn+1q are all (approximately)
linear combinations of xn, xn1, . . . , x1.
Multi-step Prediction
The first two parts are again (approximately) linear combinations of xn, xn1, . . . , x1, and the last is uncorrelated with
xn, xn1, . . . , x1. So
n
xn
=
x
1
n+2
n+1 + 2 xn + + p xn+2p
+ 2wn + + q wn+2q
10
xt1 + wt .
xt =
p1
p1
pp p1
Here each component time series is typically ARMA(p, p 1).
Summary: youll often find that you can use small p and
q p, perhaps q = 0 or q = p 1 or q = p, depending on the
background of the series.
Estimation
and
2 /(1 2 )].
f1(x1) is N [, w
S(, )
2
,
1 exp
2
2w
where
S(, ) = (1 2) (x1 )2 +
(xt ) xt1 2 .
t=2
6
method = uls:
S(, ).
(xt ) xt1 2 .
t=2
Brute Force
nn
(0)
(1)
(2)
(1)
(0)
(1)
(2)
(1)
(0)
...
...
...
(n 1) (n 2) (n 3)
. . . (n 1)
. . . (n 2)
. . . (n 3)
...
...
...
(0)
9
Rs ARMAacf(...)
to
Likelihood is
1
1
exp (x 1) 1(x 1)
2
det(2 )
=
1
2 )
det(2w
exp
1
1(x 1)
(
x
1
)
2
2w
2 , then
Can maximize analytically with respect to and w
numerically with respect to and .
Under-differencing
Over-differencing
x
n+1 = (1 )
j xnj ,
j=0
1. First choose d:
ACF of an integrated series tends to die away slowly, so
difference until it dies away quickly;
the IACF of a non-invertible series tends to die away
slowly, which indicates over-differencing.
You may want to try more than one value of d.
10
12
E.g. Johnson & Johnson quarterly earnings; discussion typically focuses on comparison with:
previous quarter ;
same quarter, previous year.
1
p(B)wt
is ARMA(p, q):
= q (B)wt,
Then xt satisfies
P (B s)p(B)xt = Q(B s)q (B)wt.
Seasonal
ARMA
model
The non-seasonal parts p and q control short-term correlations (up to half a season, lag s/2), while the seasonal
parts P and Q control the decay of the correlations over
multiple seasons.
Note: the original fit of the straight line and seasonal dummies was by OLS;
possibly inefficient;
invalid inferences (standard errors, etc.).
Solution: refit as part of the time series model.
x = model.matrix( ~ time(jj) + factor(cycle(jj)))
jja = arima(log(jj), order = c(2, 0, 0),
seasonal = list(order = c(1, 0, 0), period = 4),
xreg = x, include.mean = FALSE)
print(jja)
tsdiag(jja)
8
Notes:
The time series being fitted is the original unadjusted log(jj).
The regressors are specified as the matrix argument xreg.
arima does not check for linear dependence, so we must either
omit one dummy variable from xreg or use include.mean =
FALSE in arima.
Regression parameter estimates are similar to OLS, but standard errors are roughly doubled.
Using SAS: proc arima program and output.
9
ARIMA
model
d x = (B s ) (B)w .
P (B s)p(B)D
q
t
t
Q
s
10
E.g. AR(2):
plot(ts(arima.sim(list(order = c(2,0,0), ar = c(1.5,-.95)), n = 144)))
Cyclical Behavior
Simplest case is the periodic process
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t).
where:
A is amplitude;
is frequency, in cycles per sample;
is phase, in radians;
and U1 = A cos(), U2 = A sin().
3
t = 0, 1, 2, . . .
omega * x + phi),
(1 - omega) * x - phi),
TRUE, col = "red");
= "blue");
Note:
= 0.8 = 0.5 + 0.3, and 1 = 0.2 = 0.5 0.3;
1 is folded around 0.5.
5
Stationarity
If
xt = A cos(2t + )
= U1 cos(2t) + U2 sin(2t).
and is random, uniformly distributed on [0, 2), then:
E(xt) = 0,
1
E xt+hxt = A2 cos(2h).
2
So xt is weakly stationary.
6
Also
E(U1) = E(U2) = 0,
E U12
= E U22
1 2
= A ,
2
and
E(U1U2) = 0.
Alternatively, if the U s have these properties, xt is stationary
with the same mean and autocovariances:
E(xt) = 0,
1
E xt+hxt = A2 cos(2h).
2
More generally, if
q
xt =
where:
the U s are uncorrelated with zero mean;
var Uk,1 = var Uk,2 = k2;
then xt is stationary with zero mean and autocovariances
q
k2 cos(2k h).
(h) =
k=1
Harmonic Analysis
Any time series sample x1, x2, . . . , xn can be written
(n1)/2
xt = a0 +
aj cos(2jt/n) + bj sin(2jt/n)
j=1
R examples:
par(mfcol = c(2, 1));
# one frequency:
x = cos(2*pi*(0.123)*(1:144))
plot.ts(x); spectrum(x, log = "no")
# and a second frequency:
x = x + 2 * cos(2*pi*(0.234)*(1:144))
plot.ts(x); spectrum(x, log = "no")
# and added noise:
x = x + rnorm(144)
plot.ts(x); spectrum(x, log = "no")
# the AR(2) series:
x = ts(arima.sim(list(order = c(2,0,0), ar = c(1.5,-.95)), n = 144))
plot(x); spectrum(x, log = "no")
tr st
rr ss rqs r str
t s
t t t r t s s sttr
t srs
str st s t rrs r
srt rr rsr
t t srt rr trsr s
t t
t
rqs r t rr r t
rqs
tr rr trsr t s rs trsr
t
t t
Prr
rr s
s rr s rr s P
trs s trs r
tt s r s rt s
r
r t rr s
t str st t
rst t str st s t rt
t t rr
t s r t str st ss t s
t t rr t
rs t rr s s sttr
t str st
t t rr s t rs r
s t r stt
tts t tr
s st ts t s str st
t
r rs ts s s
rtt s t s
Prrts t str st
t s stt
t t str t t s
t t s
r t t t t t
s s
r s rsts t r tr s
s tt t str st t q rss
t t
t tt ts s t s rt
t tr str t srs ts t
tt t rq
t rs r rss rrs t
rs
rs
t
t
ts str st s
rs
s t rs rss
t
rr
ttsrsstr
r
s
t t t t
t ts t
sqr t t
t t
s r r
t rs rt ss
t rs t
s r st ts t t r
tt t r t t rs
ss
sst ts t t r r s
ss tt t r
t t s t
r
t r
ss
rtr tr stts
rt
r rt t
Pr
s rt s sttr
t t t rs r t s
t P
rq s
s r t rr rqs r s ttr
s tr r rq t sr
t
s st t st r s
tr
s tr s sr rr s st
t rt t s qtt rt
t
r Prr
s t t rr
rqs
t s rt s r
t t r Prr
r
s strs st tr tr
r str t s s
r s
ts
r
Pts ts
ts r rt
tstrs t sqr
ts r r
rt tt s
r rt t
tstrs t r
t ts r ts r rrq
ts t t t t r t tr str
ts trs s strt
t r r
t t Prr
s t r
t t s r stt t s
qtt
t s t t s t tr ss rt
trt t r
t t t Prr
t r t s rs ts r
trr ts
r
t
r ts rs
r
s strs st tr tr
t ts sttt s s t rt
rtrr r
r str t s s
r s
ts
r
str
rs t ts t s t
t ts sttt s s t rt s
str rs
ts r
r str
rr tt
s st
tr st t r
tr s t ss st s t
sstt s t rr t t rq
t srs s sttr s tr t
r t r str st t
r ss ss ss t ss st srs
s rsss strs
s tss rq rqs strt strts
r
s strs st tr tr
tstrs t sqr
tt r
Period in years
1
0.5
0.25
0.167
s(f)
tstrs t r
tt r
Period in years
1
0.5
0.25
0.167
0.04
0.02
0.00
s(f)
0.06
0.08
Tapering
The periodogram works well with data containing only Fourier
frequencies:
w = rnorm(128, sd = 0.01);
x5 = cos(2*pi*(5/128)*(1:128)) + w;
x6 = cos(2*pi*(6/128)*(1:128)) + w;
par(mfcol = c(3, 1), mar = c(2, 2, 1, 1));
spectrum(x5, taper = 0, ylim = c(1e-7, 1e2));
spectrum(x6, taper = 0, ylim = c(1e-7, 1e2));
Tapering makes the main peak wider, but much reduces side
lobes.
t rs
rss str st s
rss str st s
q
r q r t str q str
rst
q q
qr r
s t r t s s
ts t rrstt
t
st s st
r q r t rs t
s s ts t rrstts t
t
rrts r s sr t trrt t rs
t sqr r
s s t srs t rt
r t t rq t
t tt r
Ps tr
qr r s s t t sqr rr
t t t rs
st rrt t t ts t
t ts s str rrt rt tr
st r t rts
s t rrt s tr r
t t st
t sr
s s r s t
t t t r t t r t
t rtt s
r s t s str
s t s str s sr rs t
s t rt t s r s
tr tr
r r rt t srs
t t
str tr s
t t
t
t str tr s st
rtr stt
r t rt s
s t r trs t s
r r r
stt r
s t s r t t stt
sqr r s
t tr r s r
s st sr rst s rt
t r
r t tss
t rqs r s ts rt
trr t s tt t r s st
rqs t s ts s s
t r
t s t sqr r t t tr
st t srs rrtt srs
r
s strs r r
tr st
ts tt t
q s
s r
q s
s r t
ss ts s t s t rt
s
s
r
r
s
rsss trs
tss strt strts rq rqs
rssr trr
tsr strt strtr rq rqr
strs r r
tr st
ts tt t
q s
s r
q s
s r t
stt s
ts tt s t
r trs
r tr ts t
t
t tt t
r tr
r ts r r
t t s t s t t
t
t
t tt s t t s t ts r t s
rss t t tr
t
r t
trs
rr
tsr strt
t
t t
r
tr
t t
r
t
ss
t
s r r r r t tr s rt r s
r r
t t s s
t t
t tt s
t
t
r
r tr
r r
r r
s t rq rss t t tr
tr s t r r t
tr t
r tr
tr
t
tr ss
st rtr
tr rtt
t
r tr
r r t
r r
rq rss t s
Ps
rq rss t r rts
s t t r t
s t s t
t t
stt
s r t s
s tr
r tr
t tr s t s ss tr
t s t s ss tr
trs r rrs
t t
r tr
t s t s t tr s t s
t s t s t tr s rrs t s
r s t tt tr t
t s t tr rrs tr
tt tr
t rs t tt r
t t
r tr
sts
r s tr ts
r s r s
s s t t s t s sttr t
r s r s
t r str st t s
r r
r s r s
ss
r srs
rs
s t t rss r
t t
sts
s t ts
s s
s t s t t s t t r t
sttr t
s s
t rss str st s
s s
s
ss
r r rs
rss str t strrq rss t
t tt t sqr r s
r r rq
t s srs s tr rs tr tr
sqr r s t rqs
rs s s tr
rst r
r
trs
t s
rq rss t s
s
r t t s s
rr
tt s r t
t
tt s r t
t
r t s t s
tr r
t
r tr
rq rss t s
r
s
s
t
rr
tt
r t
tt
r t
tt
r t
s s
t
ss s
t
s s
t
Ps t trts t
tr
t t t
s tr rss t t t s
Prt
t str t tt tr s
s t tr s tt
t s stt
t s t tt t tr s t s t s
rt tr r t t t
t rt tr s t t
t t t t s t qt t t
s s tr
rt rt s
r stt
r
s sr t stt t
rst stt s t
Lagged regression
The fisheries recruitment series (yt) and the Southern Oscillation Index (xt) are cross-correlated with lags of several
months.
Perhaps we can model them as
yt =
r xtr + vt
r=
In terms of filters:
zt =
r xtr
r=
B() =
r e2ir ,
r=
the spectrum of zt is
fzz () = |B()|2fxx().
2
So the spectrum of yt is
fyy () = fzz () + fvv () = |B()|2fxx() + fvv ().
and the cross spectrum of yt and xt is
fyx() = fzx() = B()fxx().
3
fyx()
.
fxx()
1/2
fyx()
d < ,
|B()|d =
1/2
1/2 fxx ()
the coefficients are
1/2
r =
1/2
n1
e2ir B()d
1
e2ik r B(k ).
n k=0
Interpreting Coherence
Recall that
fyy () = |B()|2fxx() + fvv ()
fyx()
=
fxx()
fxx() + fvv ()
= 2
yx ()fyy () + fvv ().
So
fvv () = 1 2
yx () fyy ().
The squared coherence is the proportion of the spectrum of
yt that is explained by the lagged regression on xt.
9
Forecasting
x
t =
r xtr ,
r=1
for r = 1, 2, . . .
10
That is, wt = xt x
t is uncorrelated with all past xs, and
hence with all past ws, and hence is white noise.
So the filter
wt = xt
rxtr
r xtr =
r=1
r=0
r=0
re2ir
= fxx()
r=0
re2ir
re2ir
r=0
11
2 log
log[fxx()] = log w
re2ir log
re2ir .
r=0
r=0
|log[fxx()]| d <
we can write
log[fxx()] = l0 + 2
= l0 +
r=1
lr cos(2r)
r=1
lr e2ir +
lr e2ir
r=1
12
log
re2ir =
lr e2ir ,
r=0
r=1
log
re2ir =
lr e2ir .
r=0
r=1
13
That is,
1/2
2 = exp l
w
( 0) = exp
1/2
log[fxx()] d ,
and
re2ir = exp
r=0
lr e2ir
r=1
whence for r = 1, 2, . . .
r = r =
1/2
1/2
exp
lr e2ir e2ir d.
r=1
Why do we care?
Write the mean of x1, x2, . . . , xn as
x + x2 + + xn
.
x
n = 1
n
Then
n1
1
|h|
var(x
n) =
1
(h)
n h=(n1)
n
1
|h|
1
=
(h)
n h=
n +
where (a)+ = max(a, 0) is a if a 0 and 0 if a < 0.
2
If
|h|
1
(h)
(h)
n +
h=
h=
as n .
So
nvar(x
n)
(h),
h=
or
var(x
n) =
1
1
(h) + o
.
n h=
n
3
(h) = f (0),
h=
f (0)
1
+o
:
n
n
2 is replaced by f (0).
4
But if
Fractional Integration
How can we model such series?
Fractionally integrated white noise:
(1 B)dxt = wt,
ACF is
(h) =
(h + d)(1 d)
h2d1
(h d + 1)(d)
|(h)| = .
h=
6
Notes:
var(x
n) decays like n(2d1), so
1 + slope of variance-time graph
2
gives a rough empirical estimate of d.
d=
2
w
d
2
4 sin()
so for d > 0, f () as 0.
7
ARFIMA Model
The R function fracdiff() does not allow explanatory variables, but we can use it to calculate a profile likelihood function.
E.g. global temperature versus cumulative CO2 emissions:
source("http://www.stat.ncsu.edu/people/bloomfield/courses/st730/co2w.R");
plot(cbind(globtemp, co2w));
slopes = seq(from = 0, to = 1.5, length = 151);
ll2 = rep(NA, length(slopes));
for (i in 1:length(slopes))
ll2[i] = -2 * fracdiff(globtemp - slopes[i] * co2w)$log.likelihood;
plot(slopes, ll2, type = "l");
abline(h = min(ll2) + qchisq(.95, 1));
10
The CO2 series was scaled by its change from 1900 to 2000,
so we estimate the 20th century warming as 0.68C, with a
confidence interval of (0.41C, 1.03C) (note the asymmetry:
0.68(0.27, +0.35)C).
Compare with IPCC: 19062005 warming is 0.74C 0.18C.
11
2 .
yt N + yt1 , w
ARCH Models
Simplest is ARCH(1):
yt = t t
2
t2 = 0 + 1yt1
where t is Gaussian white noise with variance 1.
Alternatively:
Conditionally on yt1, yt2, . . . ,
2
yt N 0, 0 + 1yt1
.
ARCH as AR
21
t
2 + 2
= 0 + 1yt1
t
21
t
or
2 +v ,
yt2 = 0 + 1yt1
t
where
vt = t2
21 .
t
6
Note that
E(vt|yt1, yt2, . . . ) = 0,
and hence that for h > 0,
E(vtvth) = E E vtvth|yt1, yt2, . . .
= E vthE vt|yt1, yt2, . . .
= 0,
so vt is (highly nonnormal) white noise, and yt2 is AR(1).
For positivity and stationarity, 0 > 0 and 0 1 < 1, and
unconditionally,
0
2
E yt = var(yt) =
.
1 1
7
m
j=1
2 +
j ytj
2 .
j tj
j=1
with
1 + 1 < 1
for stationarity.
0
.
1 1 1
9
10
In SAS, use proc autoreg and the garch option on the model
statement.
In R, explore and describe volatility:
nyse = ts(scan("nyse.dat"));
par(mfcol = c(2, 1));
plot(nyse);
plot(abs(nyse));
lines(lowess(time(nyse), abs(nyse), f = .005), col = "red");
par(mfcol = c(2, 2));
acf(nyse);
acf(abs(nyse));
acf(nyse^2);
11
12
A special case:
1 + 1 = 1:
IGARCH(1, 1)
GARCH(1, 1)
with
yt = t t
2 + 2
t2 = 0 + (1 1) yt1
1 t1
Solving recursively with 0 = 0:
t2 = (1 1)
j1 2
ytj
1
j=1
Tail Length
1
y
f ()
d.
R UpdateFall 2011
Shumway and Stoffers code for Example 5.3 does not work
with the R garch function.
15
gnp96 = read.table("http://www.stat.pitt.edu/stoffer/tsa2/data/gnp96.dat");
gnpr = ts(diff(log(gnp96[, 2])), frequency = 4, start = c(1947, 1));
library(fGarch);
gnpr.mod = garchFit(gnpr ~ arma(1, 0) + garch(1, 0), data.frame(gnpr = gnpr));
summary(gnpr.mod);
Title:
GARCH Modelling
Call:
garchFit(formula = gnpr ~ arma(1, 0) + garch(1, 0),
data = data.frame(gnpr = gnpr))
Mean and Variance Equation:
data ~ arma(1, 0) + garch(1, 0)
[data = data.frame(gnpr = gnpr)]
Conditional Distribution:
norm
16
Coefficient(s):
mu
ar1
0.00527795 0.36656255
omega
0.00007331
alpha1
0.19447134
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value
mu
5.278e-03
8.996e-04
5.867
ar1
3.666e-01
7.514e-02
4.878
omega 7.331e-05
9.011e-06
8.135
alpha1 1.945e-01
9.554e-02
2.035
--Signif. codes: 0 *** 0.001 ** 0.01 *
Log Likelihood:
722.2849
normalized:
Pr(>|t|)
4.44e-09
1.07e-06
4.44e-16
0.0418
***
***
***
*
0.05 . 0.1
3.253536
17
R
R
R
R
R
R^2
R^2
R^2
R
Chi^2
W
Q(10)
Q(15)
Q(20)
Q(10)
Q(15)
Q(20)
TR^2
Statistic
9.118036
0.9842405
9.874326
17.55855
23.41363
19.2821
33.23648
37.74259
25.41625
p-Value
0.01047234
0.01433578
0.4515875
0.2865844
0.2689437
0.03682245
0.004352734
0.009518987
0.01296901
18
19
Threshold Models
(j)
Regression model
yt = zt + xt
where xt has covariance matrix .
Z 1y.
yt =
j xtj + t = (B)xt + t,
j=0
This generalizes regression with correlated errors by including lags, and specializes the frequency domain lagged regression by excluding future inputs.
= E
j xt+hj + t+h xt
j=0
= hvar (xt) ,
so
y,x(h)/var (xt) provides an estimate of h.
Prewhitening
j wtj +
t,
yt =
j=0
Finally estimate the model for yt, specifying the input series,
in the form:
input = (d$(L1,1, L1,2, . . . ) . . . (Lk,1, . . . )
/(Lk+1,1, . . . ) . . . (. . . )variable)
E.g. for global temperature and an estimated historical forcing series: program and output.
The profile likelihood for climate sensitivity, constructed using a grid search in R (with p = 4), gives an estimated value
of 1.85C and 95% confidence limits of 1.44C to 2.27C.
1.5
2.0
2.5
0.4
0.5
0.6
0.7
0.8
0.9
10
306
308
310
ll2
304
302
1.5
2.0
2.5
4.4 * theta
11
ll2
0.4
0.5
0.6
0.7
0.8
0.9
lambda
12
ARMAX Models
Vector (multivariate) regression:
output vector
yt =
yt,1
yt,2
...
yt,k
input vector
zt,1
z
t,2
zt = ..
.
zt,r
1
Regression equation:
yt,i = i,1zt,1 + i,2zt,2 + + i,r zt,r + wt,i
or in vector form
yt = Bzt + wt.
Here {wt} is multivariate white noise:
E(wt) = 0,
,
w
cov wt+h, wt =
0,
h=0
h = 0.
Given observations for t = 1, 2, . . . , n, the least squares estimator of B, also the maximum likelihood estimator when
{wt} is Gaussian white noise, is
=YZ ZZ
B
where
y1
y2
...
yn
and
z1
z2
...
zn
yt Bz
n t=1
t .
yt Bz
3
Information criteria:
Akaike:
w +
AIC = ln
2
k(k + 1)
kr +
;
n
2
Schwarz:
w +
SIC = ln
ln n
k(k + 1)
kr +
,
n
2
k(k + 1)
2
kr +
.
nkr1
2
Vector Autoregression
E.g., VAR(1):
xt = + xt1 + wt.
Here is a k k coefficient matrix, and {wr } is Gaussian
multivariate white noise.
This resembles the vector regression equation, with:
yt = xt,
B= ,
zt =
xt1
.
5
VAR(1)
print method:
neg. log likelihood= -7188.785
A(L) =
1-1.014698L1
0-0.02482398L1
0-0.0144053L1
B(L)
1
0
0
=
0
1
0
0+0.05794167L1
1-0.9224325L1
0+0.03872528L1
0-0.04292339L1
0-0.05304638L1
1-1.024605L1
0
0
1
summary method:
neg. log likelihood = -7188.785
sample length = 2448
WGS1YR y.WGS5YR
WGS10YR
RMSE 0.2005654 0.1713752 0.1563661
ARMA: model estimated by estVARXls
inputs :
outputs: WGS1YR y.WGS5YR WGS10YR
9
input dimension = 0
output dimension = 3
order A = 1
order B = 0
order C =
9 actual parameters
6 non-zero constants
trend not estimated.
VAR(2)
print method:
neg. log likelihood= -7414.944
A(L) =
1-1.329215L1+0.3221239L2
0+0.1030711L1-0.05850615L2
0-0.1539836L1+0.1172694L
0-0.07336772L1+0.05027099L2
1-1.117284L1+0.1974304L2
0-0.1148573L1+0.0577710
0+0.0002002881L1-0.01317073L2
0-0.02287398L1+0.06233586L2
1-1.252808L1+0.226
B(L)
1
0
0
=
0
1
0
0
0
1
summary method:
neg. log likelihood = -7414.944
WGS1YR y.WGS5YR
WGS10YR
RMSE 0.1910442 0.1666275 0.1534016
ARMA: model estimated by estVARXls
inputs :
outputs: WGS1YR y.WGS5YR WGS10YR
input dimension = 0
output dimension = 3
order A = 2
order B = 0
order C =
18 actual parameters
6 non-zero constants
trend not estimated.
For VAR(1),
1 =
0.3288773
0.08581201
0.06575108
0.1534516
0.004959931
0.04152504
0.136938
0.08875425
0.2406055