Académique Documents
Professionnel Documents
Culture Documents
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.3 or any later version
published by the Free Software Foundation; with no Invariant Sections, no Front-
Cover Texts, and no Back-Cover Texts. A copy of the license is included in the
section entitled ”GNU Free Documentation License”.
SAS and all other SAS Institute Inc. product or service names are registered trade-
marks or trademarks of SAS Institute Inc. in the USA and other countries. Windows
is a trademark, Microsoft is a registered trademark of the Microsoft Corporation.
The authors accept no responsibility for errors in the programs mentioned of their
consequences.
Preface
The analysis of real data by means of statistical methods with the aid
of a software package common in industry and administration usually
is not an integral part of mathematics studies, but it will certainly be
part of a future professional work.
The practical need for an investigation of time series data is exempli-
fied by the following plot, which displays the yearly sunspot numbers
between 1749 and 1924. These data are also known as the Wolf or
Wölfer (a student of Wolf) Data. For a discussion of these data and
further literature we refer to Wei and Reilly (1989), Example 6.2.5.
The present book links up elements from time series analysis with a se-
lection of statistical procedures used in general practice including the
iv
Bibliography 337
Index 341
SAS-Index 348
Yt = Tt + Zt + St + Rt , t = 1, . . . , n. (1.1)
Gt = Tt + Zt , (1.2)
MONTH T UNEMPLYD
July 1 60572
August 2 52461
September 3 47357
October 4 48320
November 5 60219
December 6 84418
January 7 119916
February 8 124350
March 9 87309
1.1 The Additive Model for a Time Series 3
April 10 57035
May 11 39903
June 12 34053
July 13 29905
August 14 28068
September 15 26634
October 16 29259
November 17 38942
December 18 65036
January 19 110728
February 20 108931
March 21 71517
April 22 54428
May 23 42911
June 24 37123
July 25 33044
August 26 30755
September 27 28742
October 28 31968
November 29 41427
December 30 63685
January 31 99189
February 32 104240
March 33 75304
April 34 43622
May 35 33990
June 36 26819
July 37 25291
August 38 24538
September 39 22685
October 40 23945
November 41 28245
December 42 47017
January 43 90920
February 44 89340
March 45 47792
April 46 28448
May 47 19139
June 48 16728
July 49 16523
August 50 16622
September 51 15499
Listing 1.1.1: Unemployed1 Data.
1 /* unemployed1_listing.sas */
2 TITLE1 ’Listing’;
3 TITLE2 ’Unemployed1 Data’;
4
5 /* Read in the data (Data-step) */
6 DATA data1;
7 INFILE ’c:\data\unemployed1.txt’;
8 INPUT month $ t unemplyd;
4 Elements of Exploratory Time Series Analysis
9
10 /* Print the data (Proc-step) */
11 PROC PRINT DATA = data1 NOOBS;
12 RUN;QUIT;
This program consists of two main parts, a variables to be printed out, ’dress up’ of the dis-
DATA and a PROC step. play etc. The SAS internal observation number
The DATA step started with the DATA statement (OBS) is printed by default, NOOBS suppresses
creates a temporary dataset named data1. the column of observation numbers on each
The purpose of INFILE is to link the DATA step line of output. An optional VAR statement deter-
to a raw dataset outside the program. The path- mines the order (from left to right) in which vari-
name of this dataset depends on the operat- ables are displayed. If not specified (like here),
ing system; we will use the syntax of MS-DOS, all variables in the data set will be printed in the
which is most commonly known. INPUT tells order they were defined to SAS. Entering RUN;
SAS how to read the data. Three variables are at any point of the program tells SAS that a unit
defined here, where the first one contains char- of work (DATA step or PROC) ended. SAS then
acter values. This is determined by the $ sign stops reading the program and begins to exe-
behind the variable name. For each variable cute the unit. The QUIT; statement at the end
one value per line is read from the source into terminates the processing of SAS.
the computer’s memory. A line starting with an asterisk * and ending
The statement PROC procedurename with a semicolon ; is ignored. These comment
DATA=filename; invokes a procedure that statements may occur at any point of the pro-
is linked to the data from filename. Without gram except within raw data or another state-
the option DATA=filename the most recently ment.
created file is used. The TITLE statement generates a title. Its
The PRINT procedure lists the data; it comes printing is actually suppressed here and in the
with numerous options that allow control of the following.
1 /* unemployed1_plot.sas */
2 TITLE1 ’Plot’;
3 TITLE2 ’Unemployed1 Data’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\unemployed1.txt’;
8 INPUT month $ t unemplyd;
9
10 /* Graphical Options */
11 AXIS1 LABEL=(ANGLE=90 ’unemployed’);
12 AXIS2 LABEL=(’t’);
13 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.4 W=1;
14
15 /* Plot the data */
16 PROC GPLOT DATA=data1;
17 PLOT unemplyd*t / VAXIS=AXIS1 HAXIS=AXIS2;
18 RUN; QUIT;
Variables can be plotted by using the GPLOT in which the data are displayed. V=DOT
procedure, where the graphical output is con- C=GREEN I=JOIN H=0.4 W=1 tell SAS to
trolled by numerous options. plot green dots of height 0.4 and to join
The AXIS statements with the LABEL options them with a line of width 1. The PLOT
control labelling of the vertical and horizontal statement in the GPLOT procedure is of
axes. ANGLE=90 causes a rotation of the label the form PLOT y-variable*x-variable /
of 90◦ so that it parallels the (vertical) axis in options;, where the options here define the
this example. horizontal and the vertical axes.
The SYMBOL statement defines the manner
6 Elements of Exploratory Time Series Analysis
1 /* logistic.sas */
2 TITLE1 ’Plots of the Logistic Function’;
3
4 /* Generate the data for different logistic functions */
5 DATA data1;
6 beta3=1;
7 DO beta1= 0.5, 1;
8 DO beta2=0.1, 1;
9 DO t=-10 TO 10 BY 0.5;
10 s=COMPRESS(’(’ || beta1 || ’,’ || beta2 || ’,’ || beta3 || ’)’);
11 f_log=beta3/(1+beta2*EXP(-beta1*t));
12 OUTPUT;
13 END;
14 END;
15 END;
16
17 /* Graphical Options */
18 SYMBOL1 C=GREEN V=NONE I=JOIN L=1;
19 SYMBOL2 C=GREEN V=NONE I=JOIN L=2;
20 SYMBOL3 C=GREEN V=NONE I=JOIN L=3;
21 SYMBOL4 C=GREEN V=NONE I=JOIN L=33;
22 AXIS1 LABEL=(H=2 ’f’ H=1 ’log’ H=2 ’(t)’);
23 AXIS2 LABEL=(’t’);
24 LEGEND1 LABEL=(F=CGREEK H=2 ’(b’ H=1 ’1’ H=2 ’, b’ H=1 ’2’ H=2 ’,b’ H
,→=1 ’3’ H=2 ’)=’);
25
This means that there is a linear relationship among 1/flog (t). This
can serve as a basis for estimating the parameters β1 , β2 , β3 by an
appropriate linear least squares approach, see Exercises 1.2 and 1.3.
In the following example we fit the logistic trend model (1.5) to the
population growth of the area of North Rhine-Westphalia (NRW),
which is a federal state of Germany.
5 years steps from 1935 to 1980 as well as their predicted values ŷt ,
obtained from a least squares estimation as described in (1.4) for a
logistic model.
β̂3
ŷt :=
1 + β̂2 exp(−β̂1 t)
21.5016
=
1 + 1.1436 exp(−0.1675 t)
with the estimated saturation size β̂3 = 21.5016. The following plot
shows the data and the fitted logistic curve.
10 Elements of Exploratory Time Series Analysis
1 /* population1.sas */
2 TITLE1 ’Population sizes and logistic fit’;
3 TITLE2 ’Population1 Data’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\population1.txt’;
8 INPUT year t pop;
9
10 /* Compute parameters for fitted logistic function */
11 PROC NLIN DATA=data1 OUTEST=estimate;
12 MODEL pop=beta3/(1+beta2*EXP(-beta1*t));
13 PARAMETERS beta1=1 beta2=1 beta3=20;
14 RUN;
15
16 /* Generate fitted logistic function */
17 DATA data2;
18 SET estimate(WHERE=(_TYPE_=’FINAL’));
19 DO t1=0 TO 11 BY 0.2;
20 f_log=beta3/(1+beta2*EXP(-beta1*t1));
21 OUTPUT;
22 END;
23
28 /* Graphical options */
29 AXIS1 LABEL=(ANGLE=90 ’population in millions’);
30 AXIS2 LABEL=(’t’);
31 SYMBOL1 V=DOT C=GREEN I=NONE;
32 SYMBOL2 V=NONE C=GREEN I=JOIN W=1;
33
34 /* Plot data with fitted function */
35 PROC GPLOT DATA=data3;
36 PLOT pop*t=1 f_log*t1=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
37 RUN; QUIT;
The procedure NLIN fits nonlinear regression rameter. Using the final estimates of PROC
models by least squares. The OUTEST option NLIN by the SET statement in combination with
names the data set to contain the parameter the WHERE data set option, the second data
estimates produced by NLIN. The MODEL state- step generates the fitted logistic function val-
ment defines the prediction equation by declar- ues. The options in the GPLOT statement cause
ing the dependent variable and defining an ex- the data points and the predicted function to be
pression that evaluates predicted values. A shown in one plot, after they were stored to-
PARAMETERS statement must follow the PROC gether in a new data set data3 merging data1
NLIN statement. Each parameter=value ex- and data2 with the MERGE statement.
pression specifies the starting values of the pa-
1 /* gompertz.sas */
2 TITLE1 ’Gompertz curves’;
3
4 /* Generate the data for different Gompertz functions */
5 DATA data1;
6 beta1=1;
7 DO beta2=-1, 1;
8 DO beta3=0.05, 0.5;
9 DO t=0 TO 4 BY 0.05;
10 s=COMPRESS(’(’ || beta1 || ’,’ || beta2 || ’,’ || beta3 || ’)’);
11 f_g=EXP(beta1+beta2*beta3**t);
12 OUTPUT;
13 END;
14 END;
15 END;
16
17 /* Graphical Options */
18 SYMBOL1 C=GREEN V=NONE I=JOIN L=1;
19 SYMBOL2 C=GREEN V=NONE I=JOIN L=2;
20 SYMBOL3 C=GREEN V=NONE I=JOIN L=3;
21 SYMBOL4 C=GREEN V=NONE I=JOIN L=33;
22 AXIS1 LABEL=(H=2 ’f’ H=1 ’G’ H=2 ’(t)’);
23 AXIS2 LABEL=(’t’);
1.1 The Additive Model for a Time Series 13
24 LEGEND1 LABEL=(F=CGREEK H=2 ’(b’ H=1 ’1’ H=2 ’,b’ H=1 ’2’ H=2 ’,b’ H=1
,→ ’3’ H=2 ’)=’);
25
We obviously have
The least squares estimates of β1 and log(β2 ) in the above linear re-
gression model are (see, for example Falk et al., 2002, Theorem 3.2.2)
P10
(log(t) − log(t))(log(yt ) − log(y))
β̂1 = t=1 P10 = 1.019,
(log(t) − log(t))2
t=1
1
P10 1
P10
where log(t) := 10 t=1 log(t) = 1.5104, log(y) := 10 t=1 log(yt ) =
0.7849, and hence
\2 ) = log(y) − β̂1 log(t) = −0.7549
log(β
We estimate β2 therefore by
t yt − ŷt
1 0.0159
2 0.0201
3 -0.1176
4 -0.0646
5 0.1430
6 0.1017
7 -0.1583
8 -0.2526
9 -0.0942
10 0.5662
Table 1.1.3 lists the residuals yt − ŷt by which one can judge the
goodness of fit of the model (1.11).
A popular measure for assessing the fit is the squared multiple corre-
lation coefficient or R2 -value
Pn
(yt − ŷt )2
R2 := 1 − Pt=1n 2
(1.12)
t=1 (y t − ȳ)
Note that the residuals ỹt − ỹˆt = yt − ŷt are not influenced by adding
the constant 5.178 to yt . The above models might help judging the
average tax payer’s situation between 1960 and 1970 and to predict
his future one. It is apparent from the residuals in Table 1.1.3 that
the net income yt is an almost perfect multiple of t for t between 1
and 9, whereas the large increase y10 in 1970 seems to be an outlier.
Actually, in 1969 the German government had changed and in 1970 a
long strike in Germany caused an enormous increase in the income of
civil servants.
Yt = Tt + St + Rt , t = 1, 2, . . . (1.13)
Linear Filters
Let a−r , a−r+1 , . . . , as be arbitrary real numbers, where r, s ≥ 0, r +
s + 1 ≤ n. The linear transformation
s
X
Yt∗ := au Yt−u , t = s + 1, . . . , n − r,
u=−r
1.2 Linear Filtering of Time Series 17
∗ 1
Yt+1 = Yt∗ + (Yt+s+1 − Yt−s ).
2s + 1
This filter is a particular example of a low-pass filter, which preserves
the slowly varying trend component of a series but removes from it the
rapidly fluctuating or high frequency component. There is a trade-off
between the two requirements that the irregular fluctuation should be
reduced by a filter, thus leading, for example, to a large choice of s in
a simple moving average, and that the long term variation in the data
should not be distorted by oversmoothing, i.e., by a too large choice
of s. If we assume, for example, a time series Yt = Tt + Rt without
18 Elements of Exploratory Time Series Analysis
1 /* females.sas */
2 TITLE1 ’Simple Moving Average of Order 17’;
3 TITLE2 ’Unemployed Females Data’;
4
5 /* Read in the data and generate SAS-formatted date */
6 DATA data1;
7 INFILE ’c:\data\female.txt’;
8 INPUT upd @@;
9 date=INTNX(’month’,’01jan61’d, _N_-1);
10 FORMAT date yymon.;
11
12 /* Compute the simple moving averages of order 17 */
13 PROC EXPAND DATA=data1 OUT=data2 METHOD=NONE;
14 ID date;
15 CONVERT upd=ma17 / TRANSFORM=(CMOVAVE 17);
16
17 /* Graphical options */
18 AXIS1 LABEL=(ANGLE=90 ’Unemployed Females’);
19 AXIS2 LABEL=(’Date’);
20 SYMBOL1 V=DOT C=GREEN I=JOIN H=.5 W=1;
21 SYMBOL2 V=STAR C=GREEN I=JOIN H=.5 W=1;
22 LEGEND1 LABEL=NONE VALUE=(’Original data’ ’Moving average of order
,→17’);
23
24 /* Plot the data together with the simple moving average */
25 PROC GPLOT DATA=data2;
26 PLOT upd*date=1 ma17*date=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2
,→LEGEND=LEGEND1;
27
28 RUN; QUIT;
In the data step the values for the variable upd wish to do this here, we choose METHOD=NONE.
are read from an external file. The option @@ The ID variable specifies the time index, in our
allows SAS to read the data over line break in case the date, by which the observations are
the original txt-file. ordered. The CONVERT statement now com-
By means of the function INTNX, a new vari- putes the simple moving average. The syntax is
able in a date format is generated, containing original=smoothed variable. The smooth-
monthly data starting from the 1st of January ing method is given in the TRANSFORM option.
1961. The temporarily created variable N , CMOVAVE number specifies a simple moving
which counts the number of cases, is used to average of order number. Remark that for the
determine the distance from the starting value. values at the boundary the arithmetic mean of
The FORMAT statement attributes the format the data within the moving window is computed
yymon to this variable, consisting of four digits as the simple moving average. This is an ex-
for the year and three for the month. tension of our definition of a simple moving av-
The SAS procedure EXPAND computes simple erage. Also other smoothing methods can be
moving averages and stores them in the file specified in the TRANSFORM statement like the
specified in the OUT= option. EXPAND is also exponential smoother with smooting parameter
able to interpolate series. For example if one alpha (see page 33ff.) by EWMA alpha.
has a quaterly series and wants to turn it into The smoothed values are plotted together with
monthly data, this can be done by the method the original series against the date in the final
stated in the METHOD= option. Since we do not step.
20 Elements of Exploratory Time Series Analysis
Seasonal Adjustment
A simple moving average of a time series Yt = Tt + St + Rt now
decomposes as
Yt∗ = Tt∗ + St∗ + Rt∗ ,
where St∗ is the pertaining moving average of the seasonal components.
Suppose, moreover, that St is a p-periodic function, i.e.,
St = St+p , t = 1, . . . , n − p.
Take for instance monthly average temperatures Yt measured at fixed
points, in which case it is reasonable to assume a periodic seasonal
component St with period p = 12 months. A simple moving average
of order p then yields a constant value St∗ = S, t = p, p + 1, . . . , n − p.
By adding this constant S to the trend function Tt and putting Tt0 :=
Tt + S, we can assume in the following that S = 0. Thus we obtain
for the differences
Dt := Yt − Yt∗ ∼ St + Rt .
To estimate St we average the differences with lag p (note that they
vary around St ) by
nt −1
1 X
D̄t := Dt+jp ∼ St , t = 1, . . . , p,
nt j=0
D̄t := D̄t−p for t > p,
where nt is the number of periods available for the computation of D̄t .
Thus,
p p
1X 1X
Ŝt := D̄t − D̄j ∼ St − Sj = St (1.14)
p j=1 p j=1
is an estimator of St = St+p = St+2p = . . . satisfying
p−1 p−1
1X 1X
Ŝt+j = 0 = St+j .
p j=0 p j=0
1.2 Linear Filtering of Time Series 21
dt (rounded values)
Month d¯t (rounded) ŝt (rounded)
1976 1977 1978 1979
January 53201 56974 48469 52611 52814 53136
February 59929 54934 54102 51727 55173 55495
March 24768 17320 25678 10808 19643 19966
April -3848 42 -5429 – -3079 -2756
May -19300 -11680 -14189 – -15056 -14734
June -23455 -17516 -20116 – -20362 -20040
July -26413 -21058 -20605 – -22692 -22370
August -27225 -22670 -20393 – -23429 -23107
September -27358 -24646 -20478 – -24161 -23839
October -23967 -21397 -17440 – -20935 -20612
November -14300 -10846 -11889 – -12345 -12023
December 11540 12213 7923 – 10559 10881
Table 1.2.1: Table of dt , d¯t and of estimates ŝt of the seasonal compo-
nent St in the Unemployed1 Data.
1 /* temperatures.sas */
2 TITLE1 ’Original and seasonally adjusted data’;
3 TITLE2 ’Temperatures data’;
4
5 /* Read in the data and generate SAS-formatted date */
6 DATA temperatures;
7 INFILE ’c:\data\temperatures.txt’;
8 INPUT temperature;
9 date=INTNX(’month’,’01jan95’d,_N_-1);
10 FORMAT date yymon.;
11
12 /* Make seasonal adjustment */
13 PROC TIMESERIES DATA=temperatures OUT=series SEASONALITY=12 OUTDECOMP=
,→deseason;
14 VAR temperature;
15 DECOMP /MODE=ADD;
16
17 /* Merge necessary data for plot */
18 DATA plotseries;
19 MERGE temperatures deseason(KEEP=SA);
20
21 /* Graphical options */
22 AXIS1 LABEL=(ANGLE=90 ’temperatures’);
23 AXIS2 LABEL=(’Date’);
1.2 Linear Filtering of Time Series 23
Yt = Tt + St + Rt
(2)
Plot 1.2.3: Plot of the Unemployed1 Data yt and of yt , seasonally
adjusted by the X–11 procedure.
1 /* unemployed1_x11.sas */
2 TITLE1 ’Original and X-11 seasonal adjusted data’;
3 TITLE2 ’Unemployed1 Data’;
4
5 /* Read in the data and generated SAS-formatted date */
6 DATA data1;
7 INFILE ’c:\data\unemployed1.txt’;
8 INPUT month $ t upd;
9 date=INTNX(’month’,’01jul75’d, _N_-1);
10 FORMAT date yymon.;
11
12 /* Apply X-11-Program */
13 PROC X11 DATA=data1;
14 MONTHLY DATE=date ADDITIVE;
15 VAR upd;
16 OUTPUT OUT=data2 B1=upd D11=updx11;
17
18 /* Graphical options */
19 AXIS1 LABEL=(ANGLE=90 ’unemployed’);
20 AXIS2 LABEL=(’Date’) ;
21 SYMBOL1 V=DOT C=GREEN I=JOIN H=1 W=1;
22 SYMBOL2 V=STAR C=GREEN I=JOIN H=1 W=1;
23 LEGEND1 LABEL=NONE VALUE=(’original’ ’adjusted’);
24
If we differentiate the left hand side with respect to each βj and set
the derivatives equal to zero, we see that the minimizers satisfy the
p + 1 linear equations
k
X k
X k
X k
X
j j+1 j+p
β0 u + β1 u + · · · + βp u = uj yt+u
u=−k u=−k u=−k u=−k
X T Xβ = X T y (1.16)
28 Elements of Exploratory Time Series Analysis
where
−k (−k)2 (−k)p
1 ...
1 −k + 1 (−k + 1)2 . . . (−k + 1)p
X=
... ... .. (1.17)
.
1 k k2 ... kp
Difference Filter
We have already seen that we can remove a periodic seasonal compo-
nent from a time series by utilizing an appropriate linear filter. We
will next show that also a polynomial trend function can be removed
by a suitable linear filter.
Lemma 1.2.6. For a polynomial f (t) := c0 + c1 t + · · · + cp tp of degree
p, the difference
∆f (t) := f (t) − f (t − 1)
is a polynomial of degree at most p − 1.
Proof. The assertion is an immediate consequence of the binomial
expansion
p
X p k
(t − 1)p = t (−1)p−k = tp − ptp−1 + · · · + (−1)p .
k
k=0
30 Elements of Exploratory Time Series Analysis
∆Yt = Yt − Yt−1
∆p Yt = ∆(∆p−1 Yt ), t = p, . . . , n,
∆2 Yt = ∆Yt − ∆Yt−1
= Yt − Yt−1 − Yt−1 + Yt−2 = Yt − 2Yt−1 + Yt−2 .
Plot 1.2.4: Annual electricity output, first and second order differ-
ences.
1 /* electricity_differences.sas */
2 TITLE1 ’First and second order differences’;
3 TITLE2 ’Electricity Data’;
4 /* Note that this program requires the macro mkfields.sas to be
,→submitted before this program */
5
13 delta2=DIF(delta1);
14
15 /* Graphical options */
16 AXIS1 LABEL=NONE;
17 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5 W=1;
18
19 /* Generate three plots */
20 GOPTIONS NODISPLAY;
21 PROC GPLOT DATA=data1 GOUT=fig;
22 PLOT sum*year / VAXIS=AXIS1 HAXIS=AXIS2;
23 PLOT delta1*year / VAXIS=AXIS1 VREF=0;
24 PLOT delta2*year / VAXIS=AXIS1 VREF=0;
25 RUN;
26
Exponential Smoother
Let Y0 , . . . , Yn be a time series and let α ∈ [0, 1] be a constant. The
linear filter
= λ(1 − (1 − α)t−N +1 )+
t−N +1 N −1 t
µ (1 − α) (1 − (1 − α) ) + (1 − α)
−→t→∞ λ. (1.21)
1 /* sunspot_correlogram */
2 TITLE1 ’Correlogram of first order differences’;
3 TITLE2 ’Sunspot Data’;
4
5 /* Read in the data, generate year of observation and
6 compute first order differences */
7 DATA data1;
8 INFILE ’c:\data\sunspot.txt’;
9 INPUT spot @@;
10 date=1748+_N_;
11 diff1=DIF(spot);
12
13 /* Compute autocorrelation function */
14 PROC ARIMA DATA=data1;
15 IDENTIFY VAR=diff1 NLAG=49 OUTCOV=corr NOPRINT;
1.3 Autocovariances and Autocorrelations 37
16
17 /* Graphical options */
18 AXIS1 LABEL=(’r(k)’);
19 AXIS2 LABEL=(’k’) ORDER=(0 12 24 36 48) MINOR=(N=11);
20 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5 W=1;
21
22 /* Plot autocorrelation function */
23 PROC GPLOT DATA=corr;
24 PLOT CORR*LAG / VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
25 RUN; QUIT;
In the data step, the raw data are read into OUTCOV=corr causes SAS to create a data set
the variable spot. The specification @@ sup- corr containing among others the variables
presses the automatic line feed of the INPUT LAG and CORR. These two are used in the fol-
statement after every entry in each row, see lowing GPLOT procedure to obtain a plot of the
also Program 1.2.1 (females.txt). The variable autocorrelation function. The ORDER option in
date and the first order differences of the vari- the AXIS2 statement specifies the values to
able of interest spot are calculated. appear on the horizontal axis as well as their
The following procedure ARIMA is a crucial order, and the MINOR option determines the
one in time series analysis. Here we just number of minor tick marks between two major
need the autocorrelation of delta, which will ticks. VREF=0 generates a horizontal reference
be calculated up to a lag of 49 (NLAG=49) line through the value 0 on the vertical axis.
by the IDENTIFY statement. The option
|ρ(k)| ≤ 1 = ρ(0).
1 /* airline_plot.sas */
2 TITLE1 ’Monthly totals from January 49 to December 60’;
3 TITLE2 ’Airline Data’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\airline.txt’;
8 INPUT y;
9 t=_N_;
10
11 /* Graphical options */
12 AXIS1 LABEL=NONE ORDER=(0 12 24 36 48 60 72 84 96 108 120 132 144)
,→MINOR=(N=5);
13 AXIS2 LABEL=(ANGLE=90 ’total in thousands’);
14 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.2;
15
1.3 Autocovariances and Autocorrelations 39
The variation of the data yt obviously increases with their height. The
logtransformed data xt = log(yt ), displayed in the following figure,
however, show no dependence of variability from height.
1 /* airline_log.sas */
2 TITLE1 ’Logarithmic transformation’;
3 TITLE2 ’Airline Data’;
4
5 /* Read in the data and compute log-transformed data */
40 Elements of Exploratory Time Series Analysis
6 DATA data1;
7 INFILE ’c\data\airline.txt’;
8 INPUT y;
9 t=_N_;
10 x=LOG(y);
11
12 /* Graphical options */
13 AXIS1 LABEL=NONE ORDER=(0 12 24 36 48 60 72 84 96 108 120 132 144)
,→MINOR=(N=5);
14 AXIS2 LABEL=NONE;
15 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.2;
16
17 /* Plot log-transformed data */
18 PROC GPLOT DATA=data1;
19 PLOT x*t / HAXIS=AXIS1 VAXIS=AXIS2;
20 RUN; QUIT;
The plot of the log-transformed data is done in ences are the log-transformation by means of
the same manner as for the original data in Pro- the LOG function and the suppressed label on
gram 1.3.2 (airline plot.sas). The only differ- the vertical axis.
The fact that taking the logarithm of data often reduces their vari-
ability, can be illustrated as follows. Suppose, for example, that
the data were generated by random variables, which are of the form
Yt = σt Zt , where σt > 0 is a scale factor depending on t, and Zt ,
t ∈ Z, are independent copies of a positive random variable Z with
variance 1. The variance of Yt is in this case σt2 , whereas the variance
of log(Yt ) = log(σt ) + log(Zt ) is a constant, namely the variance of
log(Z), if it exists.
A transformation of the data, which reduces the dependence of the
variability on their height, is called variance stabilizing. The loga-
rithm is a particular case of the general Box–Cox (1964) transforma-
tion Tλ of a time series (Yt ), where the parameter λ ≥ 0 is chosen by
the statistician:
( λ
(Yt − 1)/λ, Yt ≥ 0, λ > 0
Tλ (Yt ) :=
log(Yt ), Yt > 0, λ = 0.
Note that limλ&0 Tλ (Yt ) = T0 (Yt ) = log(Yt ) if Yt > 0 (Exercise 1.22).
Popular choices of the parameter λ are 0 and 1/2. A variance stabi-
lizing transformation of the data, if necessary, usually precedes any
further data manipulation such as trend or seasonal adjustment.
Exercises 41
Exercises
1.1. Plot the Mitscherlich function for different values of β1 , β2 , β3
using PROC GPLOT.
1.2. Put in the logistic trend model (1.5) zt := 1/yt ∼ 1/ E(Yt ) =
1/flog (t), t = 1, . . . , n. Then we have the linear regression model
zt = a + bzt−1 + εt , where εt is the error variable. Compute the least
squares estimates â, b̂ of a, b and motivate the estimates β̂1 := − log(b̂),
β̂3 := (1 − exp(−β̂1 ))/â as well as
n + 1 n
1 X β̂3
β̂2 := exp β̂1 + log −1 ,
2 n t=1 yt
Year Unemployed
1950 1869
1960 271
1970 149
1975 1074
1980 889
1985 2304
1988 2242
1989 2038
1990 1883
1991 1689
1992 1808
1993 2270
1.33 0.94 0.79 0.83 0.61 0.77 0.93 0.97 1.20 1.33
1.36 1.55 0.95 0.59 0.61 0.83 1.06 1.21 1.16 1.01
0.97 1.02 1.04 0.98 1.07 0.88 1.28 1.25 1.09 1.31
1.26 1.10 0.81 0.92 0.90 0.93 0.94 0.92 0.85 0.77
0.83 1.08 0.87 0.84 0.88 0.99 0.99 1.10 1.32 1.00
0.80 0.58 0.38 0.49 1.02 1.19 0.94 1.01 1.00 1.01
1.07 1.08 1.04 1.21 1.33 1.53
1.21. Verify that the empirical correlation r(k) at lag k for the trend
yt = t, t = 1, . . . , n is given by
k k(k 2 − 1)
r(k) = 1 − 3 + 2 , k = 0, . . . , n.
n n(n2 − 1)
Plot the correlogram for different values of n. This example shows,
that the correlogram has no interpretation for non-stationary pro-
cesses (see Exercise 1.20).
Stationary Processes
A stochastic process (Yt )t∈Z of square integrable complex valued ran-
dom variables is said to be (weakly) stationary if for any t1 , t2 , k ∈ Z
E(Yt1 ) = E(Yt1 +k ) and E(Yt1 Y t2 ) = E(Yt1 +k Y t2 +k ).
The random variables of a stationary process (Yt )t∈Z have identical
means and variances. The autocovariance function satisfies moreover
for s, t ∈ Z
γ(t, s) : = Cov(Yt , Ys ) = Cov(Yt−s , Y0 ) =: γ(t − s)
= Cov(Y0 , Yt−s ) = Cov(Ys−t , Y0 ) = γ(s − t),
and thus, the autocovariance function of a stationary process can be
viewed as a function of a single argument satisfying γ(t) = γ(−t), t ∈
Z.
A stationary process (εt )t∈Z of square integrable and uncorrelated real
valued random variables is called white noise i.e., Cov(εt , εs ) = 0 for
t 6= s and there exist µ ∈ R, σ ≥ 0 such that
E(εt ) = µ, E((εt − µ)2 ) = σ 2 , t ∈ Z.
In Section 1.2 we defined linear filters of a time series, which were
based on a finite number of real valued weights. In the following
we consider linear filters with an infinite number of complex valued
weights.
Suppose that (εt )t∈Z is a white
P∞noise and letP (at )t∈Z be
P a sequence of
complex numbers satisfying t=−∞ |at | := t≥0 |at |+ t≥1 |a−t | < ∞.
Then (at )t∈Z is said to be an absolutely summable (linear) filter and
∞
X X X
Yt := au εt−u := au εt−u + a−u εt+u , t ∈ Z,
u=−∞ u≥0 u≥1
Theorem 2.1.4. The space (L2 , || · ||2 ) is complete i.e., suppose that
Xn ∈ L2 , n ∈ N, has the property that for arbitrary ε > 0 one can find
an integer N (ε) ∈ N such that ||Xn −Xm ||2 < ε if n, m ≥ N (ε). Then
there exists a random variable X ∈ L2 such that limn→∞ ||X −Xn ||2 =
0.
P {|Yt − X(t)| ≥ ε}
≤ P {|Yt − Xn (t)| + |Xn (t) − X(t)| ≥ ε}
≤ P {|Yt − Xn (t)| ≥ ε/2} + P {|X(t) − Xn (t)| ≥ ε/2} −→n→∞ 0
E(|Zt |2 ) = E |Zt − µZ + µZ |2
= E (Zt − µZ + µZ )(Zt − µz + µz )
= E |Zt − µZ |2 + |µZ |2
= γZ (0) + |µZ |2
and, thus,
sup E(|Zt |2 ) < ∞.
t∈Z
γY (t − s) = E((Yt − µY )(Ys − µY ))
X n n
X
= lim Cov au Zt−u , aw Zs−w
n→∞
u=−n w=−n
n
X Xn
= lim au āw Cov(Zt−u , Zs−w )
n→∞
u=−n w=−n
Xn Xn
= lim au āw γZ (t − s + w − u)
n→∞
u=−n w=−n
XX
= au āw γZ (t − s + w − u).
u w
2.1 Linear Filters and Stochastic Processes 55
This implies
XX
G(z) = σ 2 au āu−t z t
tX u XX XX
2 2 t t
=σ |au | + au āu−t z + au āu−t z
u t≥1 u t≤−1 u
X X X X X
2 2 u−t u−t
=σ |au | + au āt z + au āt z
u u t≤u−1 u t≥u+1
XX X X
2 u−t 2 u −t
=σ au āt z =σ au z āt z .
u t u t
Example 2.1.8. Let (εt )t∈Z be a white noise with Var(ε0 ) =: σ 2 >
0. The
Pcovariance generating function of the simple moving average
Yt = u au εt−u with a−1 = a0 = a1 = 1/3 and au = 0 elsewhere is
then given by
σ 2 −1
G(z) = (z + z 0 + z 1 )(z 1 + z 0 + z −1 )
9
σ2
= (z −2 + 2z −1 + 3z 0 + 2z 1 + z 2 ), z ∈ R.
9
Then the autocovariances are just the coefficients in the above series
σ2
γ(0) = ,
3
2σ 2
γ(1) = γ(−1) = ,
9
σ2
γ(2) = γ(−2) = ,
9
γ(k) = 0 elsewhere.
This explains the name covariance generating function.
Inverse Filters
Let now
P (au ) and (bu ) be absolutely summable filters and denote by
Yt := u au Zt−u , the filtered stationary sequence, where (Zu )u∈Z is a
stationary process. Filtering (Yt )t∈Z by means of (bu ) leads to
X XX X X
bw Yt−w = bw au Zt−w−u = ( bw au )Zt−v ,
w w u v u+w=v
P
where cv := v ∈ Z, is an absolutely summable filter:
u+w=v bw au ,
X X X X X
|cv | ≤ |bw au | = ( |au |)( |bw |) < ∞.
v v u+w=v u w
Lemma 2.1.9. Let (au ) and (bu ) be absolutely summable filters with
characteristic polynomials A1 (z) and A2 (z), whichP
both exist on some
annulus r < |z| < R. The product filter (cv ) = ( u+w=v bw au ) then
has the characteristic polynomial
Suppose now that (au ) and (bu ) are absolutely summable filters with
characteristic polynomials A1 (z) and A2 (z), which both exist on some
annulus r < z < R, where they satisfy A1 (z)A2 (z) = 1. Since 1 =
v
P
v cv z if c0 = 1 and cv = 0 elsewhere, the uniquely determined
coefficients of the characteristic polynomial of the product filter of
(au ) and (bu ) are given by
(
X 1 if v = 0
bw au =
u+w=v
0 if v 6= 0.
In this case we obtain for a stationary process (Zt ) that almost surely
X X
Yt = au Zt−u and bw Yt−w = Zt , t ∈ Z. (2.1)
u w
The filter (bu ) is, therefore, called the inverse filter of (au ).
Causal Filters
An absolutely summable filter (au )u∈Z is called causal if au = 0 for
u < 0.
Lemma 2.1.10. Let a ∈ C. The filter (au ) with a0 = 1, a1 = −a and
au = 0 elsewhere has an absolutely summable and causal inverse filter
(bu )u≥0 if and only if |a| < 1. In this case we have bu = au , u ≥ 0.
Proof. The characteristic polynomial of (au ) is A1 (z) = 1−az, z ∈ C.
Since the characteristic polynomial A2 (z) of an inverse filter satisfies
A1 (z)A2 (z) = 1 on some annulus, we have A2 (z) = 1/(1−az). Observe
now that
1 X
= au z u , if |z| < 1/|a|.
1 − az u≥0
A(z) = ap (z − z1 ) · · · (z − zp )
z z z
=c 1− 1− ··· 1 − ,
z1 z2 zp
u
P
which
P has an expansion 1/A(z) = u≥0 bu z on some annulus with
u≥0 |bu | < ∞ if each factor has such an expansion, and thus, the
proof is complete.
0, v > q,
γ(v) q−v . P
q 2
(iv) ρ(v) = =
P
γ(0)
av+w aw w=0 aw , 0 < v ≤ q,
w=0
1, v = 0,
ρ(−v) = ρ(v).
Example 2.2.2. The MA(1)-process Yt = εt + aεt−1 with a 6= 0 has
the autocorrelation function
1,
v=0
2
ρ(v) = a/(1 + a ), v = ±1
0 elsewhere.
Invertible Processes
Example 2.2.2 shows that a MA(q)-process is not uniquely determined
by its autocorrelation function. In order to get a unique relationship
between moving average processes and their autocorrelation function,
Box and Jenkins introduced the condition of invertibility. This is
useful for estimation procedures, since the coefficients of an MA(q)-
process will be estimated later by the empirical autocorrelation func-
tion, see Section 2.3. Pq
The MA(q)-process Yt = u=0 au εt−u , with a0 = 1 and a q 6= 0, is
said to be invertible if all q roots z1 , . . . , zq ∈ C of A(z) = qu=0 au z u
P
are outside of the unit circle i.e., if |zi | > 1 for 1 ≤ i ≤ q. Theorem
2.1.11 and representation (2.1) imply that the white Pqnoise process (εt ),
pertaining to an invertible MA(q)-process Yt = u=0 au εt−u , can be
obtained by means of an absolutely summable and causal filter (bu )u≥0
via X
εt = bu Yt−u , t ∈ Z,
u≥0
2.2 Moving Averages and Autoregressive Processes 63
Autoregressive Processes
A real valued stochastic process (Yt ) is said to be an autoregressive
process of order p, denoted by AR(p) if there exist a1 , . . . , ap ∈ R with
ap 6= 0, and a white noise (εt ) such that
Yt = a1 Yt−1 + · · · + ap Yt−p + εt , t ∈ Z. (2.3)
The value of an AR(p)-process at time t is, therefore, regressed on its
own past p values plus a random shock.
ρ(s) = a|s| , s ∈ Z.
1 /* ar1_autocorrelation.sas */
2 TITLE1 ’Autocorrelation functions of AR(1)-processes’;
3
4 /* Generate data for different autocorrelation functions */
5 DATA data1;
6 DO a=-0.7, 0.5, 0.9;
7 DO s=0 TO 20;
8 rho=a**s;
9 OUTPUT;
66 Models of Time Series
10 END;
11 END;
12
13 /* Graphical options */
14 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.3 L=1;
15 SYMBOL2 C=GREEN V=DOT I=JOIN H=0.3 L=2;
16 SYMBOL3 C=GREEN V=DOT I=JOIN H=0.3 L=33;
17 AXIS1 LABEL=(’s’);
18 AXIS2 LABEL=(F=CGREEK ’r’ F=COMPLEX H=1 ’a’ H=2 ’(s)’);
19 LEGEND1 LABEL=(’a=’) SHAPE=SYMBOL(10,0.6);
20
21 /* Plot autocorrelation functions */
22 PROC GPLOT DATA=data1;
23 PLOT rho*s=a / HAXIS=AXIS1 VAXIS=AXIS2 LEGEND=LEGEND1 VREF=0;
24 RUN; QUIT;
The data step evaluates rho for three differ- assuming this to be the default text font
ent values of a and the range of s from 0 (GOPTION FTEXT=COMPLEX). The SHAPE op-
to 20 using two loops. The plot is gener- tion SHAPE=SYMBOL(10,0.6) in the LEGEND
ated by the procedure GPLOT. The LABEL op- statement defines width and height of the sym-
tion in the AXIS2 statement uses, in addition bols presented in the legend.
to the greek font CGREEK, the font COMPLEX
1 /* ar1_plot.sas */
2 TITLE1 ’Realizations of AR(1)-processes’;
3
4 /* Generated AR(1)-processes */
5 DATA data1;
6 DO a=0.5, 1.5;
7 t=0; y=0; OUTPUT;
8 DO t=1 TO 10;
9 y=a*y+RANNOR(1);
10 OUTPUT;
11 END;
12 END;
13
14 /* Graphical options */
15 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.4 L=1;
16 SYMBOL2 C=GREEN V=DOT I=JOIN H=0.4 L=2;
17 AXIS1 LABEL=(’t’) MINOR=NONE;
18 AXIS2 LABEL=(’Y’ H=1 ’t’);
19 LEGEND1 LABEL=(’a=’) SHAPE=SYMBOL(10,0.6);
20
21 /* Plot the AR(1)-processes */
22 PROC GPLOT DATA=data1(WHERE=(t>0));
23 PLOT y*t=a / HAXIS=AXIS1 VAXIS=AXIS2 LEGEND=LEGEND1;
24 RUN; QUIT;
68 Models of Time Series
The data are generated within two loops, the the program is submitted. A value of y is calcu-
first one over the two values for a. The vari- lated as the sum of a times the actual value of
able y is initialized with the value 0 correspond- y and the random number and stored in a new
ing to t=0. The realizations for t=1, ..., observation. The resulting data set has 22 ob-
10 are created within the second loop over t servations and 3 variables (a, t and y).
and with the help of the function RANNOR which In the plot created by PROC GPLOT the ini-
returns pseudo random numbers distributed as tial observations are dropped using the WHERE
standard normal. The argument 1 is the initial data set option. Only observations fulfilling the
seed to produce a stream of random numbers. condition t>0 are read into the data set used
A positive value of this seed always produces here. To suppress minor tick marks between
the same series of random numbers, a nega- the integers 0,1, ...,10 the option MINOR
tive value generates a different series each time in the AXIS1 statement is set to NONE.
âk := Rk−1 rk ,
2.2 Moving Averages and Autoregressive Processes 71
α(1) = ρ(1),
α(2) = a2 ,
α(j) = 0, j ≥ 3.
1 /* ar2_plot.sas */
2 TITLE1 ’Realisation of an AR(2)-process’;
3
4 /* Generated AR(2)-process */
5 DATA data1;
6 t=-1; y=0; OUTPUT;
7 t=0; y1=y; y=0; OUTPUT;
8 DO t=1 TO 200;
9 y2=y1;
10 y1=y;
11 y=0.6*y1-0.3*y2+RANNOR(1);
12 OUTPUT;
13 END;
14
15 /* Graphical options */
16 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.3;
17 AXIS1 LABEL=(’t’);
18 AXIS2 LABEL=(’Y’ H=1 ’t’);
19
20 /* Plot the AR(2)-processes */
21 PROC GPLOT DATA=data1(WHERE=(t>0));
22 PLOT y*t / HAXIS=AXIS1 VAXIS=AXIS2;
23 RUN; QUIT;
2.2 Moving Averages and Autoregressive Processes 73
The two initial values of y are defined and values y2 (for yt−2 ), y1 and y are updated
stored in an observation by the OUTPUT state- one after the other. The data set used by
ment. The second observation contains an ad- PROC GPLOT again just contains the observa-
ditional value y1 for yt−1 . Within the loop the tions with t > 0.
1 /* ar2_epa.sas */
2 TITLE1 ’Empirical partial autocorrelation function’;
3 TITLE2 ’of simulated AR(2)-process data’;
4 /* Note that this program requires data1 generated by the previous
,→program (ar2_plot.sas) */
5
13 AXIS2 LABEL=(’a(k)’);
14
15 /* Plot autocorrelation function */
16 PROC GPLOT DATA=corr;
17 PLOT PARTCORR*LAG / HAXIS=AXIS1 VAXIS=AXIS2 VREF=0;
18 RUN; QUIT;
This program requires to be submitted to SAS the procedure ARIMA with the IDENTIFY state-
for execution within a joint session with Pro- ment is used to create a data set. Here we are
gram 2.2.3 (ar2 plot.sas), because it uses the interested in the variable PARTCORR containing
temporary data step data1 generated there. the values of the empirical partial autocorrela-
Otherwise you have to add the block of state- tion function from the simulated AR(2)-process
ments to this program concerning the data step. data. This variable is plotted against the lag
Like in Program 1.3.1 (sunspot correlogram.sas) stored in variable LAG.
ARMA-Processes
Moving averages MA(q) and autoregressive AR(p)-processes are spe-
cial cases of so called autoregressive moving averages. Let (εt )t∈Z be a
white noise, p, q ≥ 0 integers and a0 , . . . , ap , b0 , . . . , bq ∈ R. A real val-
ued stochastic process (Yt )t∈Z is said to be an autoregressive moving
average process of order p, q, denoted by ARMA(p, q), if it satisfies
the equation
A(z) := 1 − a1 z − · · · − ap z p (2.13)
and
B(z) := 1 + b1 z + · · · + bq z q , (2.14)
are the characteristic polynomials of the autoregressive part and of
the moving average part of an ARMA(p, q)-process (Yt ), which we
can represent in the form
X min(v,q)
X X
= bw dv−w εt−v =: αv εt−v
v≥0 w=0 v≥0
X min(v,p)
X
=− aw gv−w Yt−v .
v≥0 w=0
A(z)(B(z)D(z)) = B(z)
Xp X q
X
u v
⇔ − au z αv z = bw z w
u=0 v≥0 w=0
X X X
⇔ − au α v z w = bw z w
w≥0 u+v=w w≥0
X Xw X
⇔ − au αw−u z w = bw z w
w≥0 u=0 w≥0
α0 = 1
w
X
α − au αw−u = bw for 1 ≤ w ≤ p
w
⇔ u=1
X p
αw −
au αw−u = bw for w > p with bw = 0 for w > q.
u=1
(2.15)
α0 = 1, α1 − a = b, αw − aαw−1 = 0. w ≥ 2,
2.2 Moving Averages and Autoregressive Processes 77
which implies
p
X q
X
γ(s) − au γ(s − u) = bv Cov(Yt−s , εt−v ).
u=1 v=0
P
From the representation Yt−s = w≥0 αw εt−s−w and Theorem 2.1.5
we obtain
(
X 0 if v < s
Cov(Yt−s , εt−v ) = αw Cov(εt−s−w , εt−v ) =
w≥0
σ 2 αv−s if v ≥ s.
This implies
p
X q
X
γ(s) − au γ(s − u) = bv Cov(Yt−s , εt−v )
u=1 v=s
(
σ 2 qv=s bv αv−s
P
if s ≤ q
=
0 if s > q,
and thus
1 + 2ab + b2 (1 + ab)(a + b)
γ(0) = σ 2 , γ(1) = σ 2 .
1 − a2 1 − a2
For s ≥ 2 we obtain from (2.16)
1 /* arma11_autocorrelation.sas */
2 TITLE1 ’Autocorrelation functions of ARMA(1,1)-processes’;
3
4 /* Compute autocorrelations functions for different ARMA(1,1)-
,→processes */
5 DATA data1;
6 DO a=-0.8, 0.8;
7 DO b=-0.5, 0, 0.5;
8 s=0; rho=1;
9 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
10 OUTPUT;
11 s=1; rho=(1+a*b)*(a+b)/(1+2*a*b+b*b);
12 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
13 OUTPUT;
14 DO s=2 TO 10;
15 rho=a*rho;
16 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
17 OUTPUT;
18 END;
19 END;
20 END;
21
22 /* Graphical options */
23 SYMBOL1 C=RED V=DOT I=JOIN H=0.7 L=1;
24 SYMBOL2 C=YELLOW V=DOT I=JOIN H=0.7 L=2;
25 SYMBOL3 C=BLUE V=DOT I=JOIN H=0.7 L=33;
26 SYMBOL4 C=RED V=DOT I=JOIN H=0.7 L=3;
27 SYMBOL5 C=YELLOW V=DOT I=JOIN H=0.7 L=4;
80 Models of Time Series
ARIMA-Processes
Suppose that the time series (Yt ) has a polynomial trend of degree d.
Then we can eliminate this trend by considering the process (∆d Yt ),
obtained by d times differencing as described in Section 1.2. If the
filtered process (∆d Yd ) is an ARMA(p, q)-process satisfying the sta-
tionarity condition (2.4), the original process (Yt ) is said to be an
autoregressive integrated moving average of order p, d, q, denoted by
ARIMA(p, d, q). In this case constants a1 , . . . , ap , b0 = 1, b1 , . . . , bq ∈
R exist such that
p
X q
X
d d
∆ Yt = au ∆ Yt−u + bw εt−w , t ∈ Z,
u=1 w=0
Cointegration
In the sequel we will frequently use the notation that a time series
(Yt ) is I(d), d = 0, 1, . . . if the sequence of differences (∆d Yt ) of order
d is a stationary process. By the difference ∆0 Yt of order zero we
denote the undifferenced process Yt , t ∈ Z.
Suppose that the two time series (Yt ) and (Zt ) satisfy
Yt = aWt + εt , Zt = Wt + δt , t ∈ Z,
for some real number a 6= 0, where (Wt ) is I(1), and (εt ), (δt ) are
uncorrelated white noise processes, i.e., Cov(εt , δs ) = 0, t, s ∈ Z, and
both are uncorrelated to (Wt ).
Then (Yt ) and (Zt ) are both I(1), but
Xt := Yt − aZt = εt − aδt , t ∈ Z,
is I(0).
The fact that the combination of two nonstationary series yields a sta-
tionary process arises from a common component (Wt ), which is I(1).
More generally, two I(1) series (Yt ), (Zt ) are said to be cointegrated
82 Models of Time Series
Xt = µ + α1 Yt + α2 Zt , t ∈ Z, (2.17)
Yt = Yt−1 + εt and
Zt = Zt−1 + δt ,
where the individual single steps (εt ), (δt ) of man and dog are uncor-
related white noise processes. Random walks are not stationary, since
their variances increase, and so both processes (Yt ) and (Zt ) are not
stationary.
And if the dog belongs to the drunkard? We assume the dog to
be unleashed and thus, the distance Yt − Zt between the drunk and
his dog is a random variable. It seems reasonable to assume that
these distances form a stationary process, i.e., that (Yt ) and (Zt ) are
cointegrated with constants α1 = 1 and α2 = −1.
Cointegration requires that both variables in question be I(1), but
that a linear combination of them be I(0). This means that the first
step is to figure out if the series themselves are I(1), typically by using
unit root tests. If one or both are not I(1), cointegration of order 1 is
not an option.
Whether two processes (Yt ) and (Zt ) are cointegrated can be tested
by means of a linear regression approach. This is based on the coin-
2.2 Moving Averages and Autoregressive Processes 83
tegration regression
Yt = β0 + β1 Zt + εt ,
where (εt ) is a stationary process and β0 , β1 ∈ R are the cointegration
constants.
One can use the ordinary least squares estimates β̂0 , β̂1 of the target
parameters β0 , β1 , which satisfy
n
X 2 n
X 2
Yt − β̂0 − β̂1 Zt = min Yt − β0 − β1 Zt ,
β0 ,β1 ∈R
t=1 t=1
(i) Determine that the two series are I(1) by standard unit root
tests such as Dickey–Fuller or augmented Dickey–Fuller.
(iii) Examine ε̂t for stationarity, using for example the Phillips–
Ouliaris test.
Example 2.2.11. (Hog Data) Quenouille (1968) Hog Data list the
annual hog supply and hog prices in the U.S. between 1867 and 1948.
Do they provide a typical example of cointegrated series? A discussion
can be found in Box and Tiao (1977).
84 Models of Time Series
1 /* hog.sas */
2 TITLE1 ’Hog supply, hog prices and differences’;
3 TITLE2 ’Hog Data (1867-1948)’;
4 /* Note that this program requires the macro mkfields.sas to be
,→submitted before this program */
5
6 /* Read in the two data sets */
7 DATA data1;
8 INFILE ’c:\data\hogsuppl.txt’;
9 INPUT supply @@;
10
11 DATA data2;
12 INFILE ’c:\data\hogprice.txt’;
13 INPUT price @@;
14
15 /* Merge data sets, generate year and compute differences */
16 DATA data3;
17 MERGE data1 data2;
2.2 Moving Averages and Autoregressive Processes 85
18 year=_N_+1866;
19 diff=supply-price;
20
21 /* Graphical options */
22 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5 W=1;
23 AXIS1 LABEL=(ANGLE=90 ’h o g s u p p l y’);
24 AXIS2 LABEL=(ANGLE=90 ’h o g p r i c e s’);
25 AXIS3 LABEL=(ANGLE=90 ’d i f f e r e n c e s’);
26
27 /* Generate three plots */
28 GOPTIONS NODISPLAY;
29 PROC GPLOT DATA=data3 GOUT=abb;
30 PLOT supply*year / VAXIS=AXIS1;
31 PLOT price*year / VAXIS=AXIS2;
32 PLOT diff*year / VAXIS=AXIS3 VREF=0;
33 RUN;
34
35 /* Display them in one output */
36 GOPTIONS DISPLAY;
37 PROC GREPLAY NOFS IGOUT=abb TC=SASHELP.TEMPLT;
38 TEMPLATE=V3;
39 TREPLAY 1:GPLOT 2:GPLOT1 3:GPLOT2;
40 RUN; DELETE _ALL_; QUIT;
The supply data and the price data read in VREF=0. The plots are put into a common
from two external files are merged in data3. graphic using PROC GREPLAY and the tem-
Year is an additional variable with values plate V3. Note that the labels of the vertical
1867, 1868, . . . , 1932. By PROC GPLOT hog axes are spaced out as SAS sets their charac-
supply, hog prices and their differences ters too close otherwise.
diff are plotted in three different plots stored
in the graphics catalog abb. The horizontal For the program to work properly the macro mk-
line at the zero level is plotted by the option fields.sas has to be submitted beforehand.
Hog supply (=: yt ) and hog price (=: zt ) obviously increase in time t
and do, therefore, not seem to be realizations of stationary processes;
nevertheless, as they behave similarly, a linear combination of both
might be stationary. In this case, hog supply and hog price would be
cointegrated.
This phenomenon can easily be explained as follows. A high price zt
at time t is a good reason for farmers to breed more hogs, thus leading
to a large supply yt+1 in the next year t + 1. This makes the price zt+1
fall with the effect that farmers will reduce their supply of hogs in the
following year t + 2. However, when hogs are in short supply, their
price zt+2 will rise etc. There is obviously some error correction mech-
anism inherent in these two processes, and the observed cointegration
86 Models of Time Series
1 0.12676 . 0.70968 .
2 . 0.86027 . 0.88448
In the next step the corresponding test statistics a three-letter specification which of the mod-
are calculated by (2.21). The factor 81 comes els (2.18) to (2.20) is to be tested. The first let-
from the fact that the hog data contain 82 obser- ter states, in which way γ is estimated (R for re-
vations and the regression is carried out with 81 gression, S for a studentized test statistic which
observations. we did not explain) and the last two letters state
After that the corresponding p-values are com- the model (ZM (Zero mean) for (2.18), SM (sin-
puted. The function PROBDF, which completes gle mean) for (2.19), TR (trend) for (2.20)).
this task, expects four arguments. First the test
statistic, then the sample size of the regres- In the final step the test statistics and corre-
sion, then the number of autoregressive vari- sponding p-values are given to the output win-
ables in (2.18) to (2.20) (in our case 1) and dow.
The p-values do not reject the hypothesis that we have two I(1) series
under model (2.18) at the 5%-level, since they are both larger than
0.05 and thus support that γ = 0.
Since we have checked that both hog series can be regarded as I(1)
we can now check for cointegration.
Phillips-Ouliaris
Cointegration Test
1 -28.9109 -4.0142
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
(i) If model (2.18) with γ = 0 has been validated for both series,
then use the following table for critical values of RHO and TAU.
This is the so-called standard case.
(ii) If model (2.19) with γ = 0 has been validated for both series,
then use the following table for critical values of RHO and TAU.
This case is referred to as demeaned .
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01
RHO -14.91 -15.93 -17.04 -18.48 -20.49 -23.81 -28.32
TAU -2.86 -2.96 -3.07 -3.20 -3.37 -3.64 -3.96
(iii) If model (2.20) with γ = 0 has been validated for both series,
then use the following table for critical values of RHO and TAU.
This case is said to be demeaned and detrended .
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01
RHO -20.79 -21.81 -23.19 -24.75 -27.09 -30.84 -35.42
TAU -3.33 -3.42 -3.52 -3.65 -3.80 -4.07 -4.36
which yields
a
σ2 = P0p .
1− j=1 aj
Yt2 = a1 Yt−1
2 2
+ · · · + ap Yt−p + εt ,
j=1
2.2 Moving Averages and Autoregressive Processes 93
yt := log(pt ) − log(pt−1 ),
94 Models of Time Series
Plot 2.2.9: Log returns of Hang Seng index and their squares.
1 /* hongkong_plot.sas */
2 TITLE1 ’Daily log returns and their squares’;
3 TITLE2 ’Hongkong Data ’;
4
5 /* Read in the data, compute log return and their squares */
6 DATA data1;
7 INFILE ’c:\data\hongkong.txt’;
8 INPUT p@@;
9 t=_N_;
10 y=DIF(LOG(p));
11 y2=y**2;
12
13 /* Graphical options */
14 SYMBOL1 C=RED V=DOT H=0.5 I=JOIN L=1;
15 AXIS1 LABEL=(’y’ H=1 ’t’) ORDER=(-.12 TO .10 BY .02);
16 AXIS2 LABEL=(’y2’ H=1 ’t’);
17
19 GOPTIONS NODISPLAY;
20 PROC GPLOT DATA=data1 GOUT=abb;
21 PLOT y*t / VAXIS=AXIS1;
22 PLOT y2*t / VAXIS=AXIS2;
23 RUN;
24
25 /* Display them in one output */
26 GOPTIONS DISPLAY;
27 PROC GREPLAY NOFS IGOUT=abb TC=SASHELP.TEMPLT;
28 TEMPLATE=V2;
29 TREPLAY 1:GPLOT 2:GPLOT1;
30 RUN; DELETE _ALL_; QUIT;
In the DATA step the observed values of the values in y2.
Hang Seng closing index are read into the vari- After defining different axis labels, two plots
able p from an external file. The time index are generated by two PLOT statements in PROC
variable t uses the SAS-variable N , and the GPLOT, but they are not displayed. By means of
log transformed and differenced values of the PROC GREPLAY the plots are merged vertically
index are stored in the variable y, their squared in one graphic.
Dependent Variable = Y
GARCH Estimates
The four steps are discussed in the following. The application of the
Box–Jenkins Program is presented in a case study in Chapter 7.
Order Selection
The order q of a moving average MA(q)-process can be estimated
by means of the empirical autocorrelation function r(k) i.e., by the
correlogram. Part (iv) of Lemma 2.2.1 shows that the autocorrelation
function ρ(k) vanishes for k ≥ q + 1. This suggests to choose the
order q such that r(q) is clearly different from zero, whereas r(k) for
k ≥ q + 1 is quite close to zero. This, however, is obviously a rather
vague selection rule.
100 Models of Time Series
2 (p + q) log(n)
BIC(p, q) := log(σ̂p,q )+
n
and the Hannan–Quinn Criterion
Estimation of Coefficients
Suppose we fixed the order p and q of an ARMA(p, q)-process (Yt )t∈Z ,
with Y1 , . . . , Yn now modelling the data y1 , . . . , yn . In the next step
we will derive estimators of the constants a1 , . . . , ap , b1 , . . . , bq in the
model
ϕµ,Σ (x1 , . . . , xn )
1
= ·
(2π)n/2 (det Σ)1/2
1
T −1 T
exp − ((x1 , . . . , xn ) − µ )Σ ((x1 , . . . , xn ) − µ)
2
is for arbitrary x1 , . . . , xn ∈ R the density of the n-dimensional normal
distribution with mean vector µ = (µ, . . . , µ)T ∈ Rn and covariance
matrix Σ = (γ(i − j))1≤i,j≤n , denoted by N (µ, Σ), where µ = E(Y0 )
and γ is the autocovariance function of the stationary process (Yt ).
The number ϕµ,Σ (x1 , . . . , xn ) reflects the probability that the random
vector (Y1 , . . . , Yn ) realizes close to (x1 , . . . , xn ). Precisely, we have for
ε↓0
The matrix
Σ0 := σ −2 Σ,
therefore, depends only on a1 , . . . , ap and b1 , . . . , bq .
We can write now the density ϕµ,Σ (x1 , . . . , xn ) as a function of ϑ :=
(σ 2 , µ, a1 , . . . , ap , b1 , . . . , bq ) ∈ Rp+q+2 and (x1 , . . . , xn ) ∈ Rn
p(x1 , . . . , xn |ϑ)
:= ϕµ,Σ (x1 , . . . , xn )
2 −n/2 0 −1/2 1
= (2πσ ) (det Σ ) exp − 2 Q(ϑ|x1 , . . . , xn ) ,
2σ
where
−1
Q(ϑ|x1 , . . . , xn ) := ((x1 , . . . , xn ) − µT )Σ0 ((x1 , . . . , xn )T − µ)
l(ϑ̂|y1 , . . . , yn )
= sup l(ϑ|y1 , . . . , yn )
ϑ
!
n 1 1
= sup − log(2πσ 2 ) − log(det Σ0 ) − 2 Q(ϑ|y1 , . . . , yn ) .
ϑ 2 2 2σ
a2 . . . an−1
1 a
0 1 a 1 a an−2
Σ = . ... .. .
1 − a2 .. .
an−1 an−2 an−3 . . . 1
2 −n/2 2 1/2
1
L(ϑ|y1 , . . . , yn ) = (2πσ ) (1 − a ) exp − 2 Q(ϑ|y1 , . . . , yn ) ,
2σ
where
Q(ϑ|y1 , . . . , yn )
−1
= ((y1 , . . . , yn ) − µ)Σ0 ((y1 , . . . , yn ) − µ)T
n−1
X
2 2 2
= (y1 − µ) + (yn − µ) + (1 + a ) (yi − µ)2
i=2
n−1
X
− 2a (yi − µ)(yi+1 − µ).
i=1
based on Yt−1 , . . . , Yt−p and εt−1 , . . . , εt−q . The prediction error is given
by the residual
Yt − Ŷt = εt .
Suppose that ε̂t is an estimator of εt , t ≤ n, which depends on the
choice of constants a1 , . . . , ap , b1 , . . . , bq and satisfies the recursion
The function
S 2 (a1 , . . . , ap , b1 , . . . , bq )
X n
= ε̂2t
t=−∞
Xn
= (yt − a1 yt−1 − · · · − ap yt−p − b1 ε̂t−1 − · · · − bq ε̂t−q )2
t=−∞
is the residual sum of squares and the least squares approach suggests
to estimate the underlying set of constants by minimizers a1 , . . . , ap
and b1 , . . . , bq of S 2 . Note that the residuals ε̂t and the constants are
nested.
We have no observation yt available for t ≤ 0. But from the assump-
tion E(εt ) = 0 and thus E(Yt ) = 0, it is reasonable to backforecast yt
by zero and to put ε̂t := 0 for t ≤ 0, leading to
n
X
2
S (a1 , . . . , ap , b1 , . . . , bq ) = ε̂2t .
t=1
The estimated residuals ε̂t can then be computed from the recursion
ε̂1 = y1
ε̂2 = y2 − a1 y1 − b1 ε̂1
ε̂3 = y3 − a1 y2 − a2 y1 − b1 ε̂2 − b2 ε̂1 (2.23)
..
.
ε̂j = yj − a1 yj−1 − · · · − ap yj−p − b1 ε̂j−1 − · · · − bq ε̂j−q ,
where j now runs from max{p, q} + 1 to n.
For example for an ARMA(2, 3)–process we have
ε̂1 = y1
ε̂2 = y2 − a1 y1 − b1 ε̂1
ε̂3 = y3 − a1 y2 − a2 y1 − b1 ε̂2 − b2 ε̂1
ε̂4 = y4 − a1 y3 − a2 y2 − b1 ε̂3 − b2 ε̂2 − b3 ε̂1
ε̂5 = y5 − a1 y4 − a2 y3 − b1 ε̂4 − b2 ε̂3 − b3 ε̂2
..
.
106 Models of Time Series
From step 4 in this iteration procedure the order (2, 3) has been at-
tained.
The coefficients a1 , . . . , ap of a pure AR(p)-process can be estimated
directly, using the Yule–Walker equations as described in (2.8).
Diagnostic Check
Suppose that the orders p and q as well as the constants a1 , . . . , ap
and b1 , . . . , bq have been chosen in order to model an ARMA(p, q)-
process underlying the data. The Portmanteau-test of Box and Pierce
(1970) checks, whether the estimated residuals ε̂t , t = 1, . . . , n, behave
approximately like realizations from a white noise process. To this end
one considers the pertaining empirical autocorrelation function
Pn−k
j=1 (ε̂j − ε̄)(ε̂j+k − ε̄)
r̂ε̂ (k) := Pn 2
, k = 1, . . . , n − 1, (2.24)
j=1 (ε̂ j − ε̄)
where ε̄ = n−1 nj=1 ε̂j , and checks, whether the values r̂ε (k) are suf-
P
ficiently close to zero. This decision is based on
K
X
Q(K) := n r̂ε̂ (k),
k=1
1978)
K 1/2 !2 K
∗
X n + 2 X 1
Q (K) := n r̂ε̂ (k) = n(n + 2) r̂ε̂ (k)
n−k n−k
k=1 k=1
Forecasting
We want to determine weights c∗0 , . . . , c∗n−1 ∈ R such that for h ∈ N
!2
n−1
X
E Yn+h − Ŷn+h = min E Yn+h − cu Yn−u ,
c0 ,...,cn−1 ∈R
u=0
Pn−1 ∗
with Ŷn+h := u=0 cu Yn−u . Then Ŷn+h with minimum mean squared
error is said to be a best (linear) h-step forecast of Yn+h , based on
Y1 , . . . , Yn . The following result provides a sufficient condition for the
optimality of weights.
Y1 , . . . , Yn . Then we have
E((Yn+h − Ỹn+h )2 )
= E((Yn+h − Ŷn+h + Ŷn+h − Ỹn+h )2 )
n−1
X
2
= E((Yn+h − Ŷn+h ) ) + 2 (c∗u − cu ) E(Yn−u (Yn+h − Ŷn+h ))
u=0
2
+ E((Ŷn+h − Ỹn+h ) )
= E((Yn+h − Ŷn+h )2 ) + E((Ŷn+h − Ỹn+h )2 )
≥ E((Yn+h − Ŷn+h )2 ).
Suppose that (Yt ) is a stationary process with mean zero and auto-
correlation function ρ. The equations (2.25) are then of Yule–Walker
type
n−1
X
ρ(h + s) = c∗u ρ(s − u), s = 0, 1, . . . , n − 1,
u=0
or, in matrix language
∗
ρ(h) c0
ρ(h + 1) c∗1
.. = Pn .. (2.26)
. .
ρ(h + n − 1) c∗n−1
from which the assertion for h = 1 follows by Lemma 2.3.2. The case
of an arbitrary h ≥ 2 is now a consequence of the recursion
until Ybi and ε̂i are defined for i = 1, . . . , n. The actual forecast is then
given by
Exercises
2.1. Show that the expectation of complex valued random variables
is linear, i.e.,
E(aY + bZ) = a E(Y ) + b E(Z),
where a, b ∈ C and Y, Z are integrable.
Show that
Cov(Y, Z) = E(Y Z̄) − E(Y )E(Z̄)
for square integrable complex random variables Y and Z.
112 Models of Time Series
2.2. Suppose that the complex random variables Y and Z are square
integrable. Show that
2.3. Give an example of a stochastic process (Yt ) such that for arbi-
trary t1 , t2 ∈ Z and k 6= 0
stationary?
2.7. Show that the autocovariance function γ : Z → C of a complex-
valued stationary process (Yt )t∈Z , which is defined by
(i) Show that c(k) is a biased estimator of γ(k) (even if the factor
n−1 is replaced by (n − k)−1 ) i.e., E(c(k)) 6= γ(k).
. . . c(k − 1)
c(0) c(1)
c(1) c(0) c(k − 2)
Ck := .
.. ... ..
.
c(k − 1) c(k − 2) . . . c(0)
2.11. Let (Yt )t∈Z be a stationary process with mean zero. If its au-
tocovariance function satisfies γ(τ ) = 0 for some τ > 0, then γ is
periodic with length τ , i.e., γ(t + τ ) = γ(t), t ∈ Z.
114 Models of Time Series
2.14. Let (εt )t∈Z be a white noise. The process Yt = ts=1 εs is said
P
to be a random walk . Plot the path of a random walk with normal
N (µ, σ 2 ) distributed εt for each of the cases µ < 0, µ = 0 and µ > 0.
2.15. Let (au ),(bu ) be absolutely summable filters and let (Zt ) be a
stochastic process with supt∈Z E(Zt2 ) < ∞. Put for t ∈ Z
X X
Xt = au Zt−u , Yt = bv Zt−v .
u v
Then we have
XX
E(Xt Yt ) = au bv E(Zt−u Zt−v ).
u v
2.18.
Pp Compute the orders p and the coefficients au of the process Yt =
u=0 au εt−u with Var(ε0 ) = 1 and autocovariance function γ(1) =
2, γ(2) = 1, γ(3) = −1 and γ(t) = 0 for t ≥ 4. Is this process
invertible?
and show that (ε̃t ) is a white noise. Show that Var(ε̃t ) = a2 Var(εt )
and (Yt ) has the invertible representation
(iv) two roots are outside, one root inside the unit disk,
(v) one root is outside, one root is inside the unit disk and one root
is on the unit circle,
(vi) all roots are outside the unit disk but close to the unit circle.
2.25. Show that the AR(2)-process Yt = a1 Yt−1 + a2 Yt−2 + εt for a1 =
1/3 and a2 = 2/9 has the autocorrelation function
16 2 |k| 5 1 |k|
ρ(k) = + − , k∈Z
21 3 21 3
and for a1 = a2 = 1/12 the autocorrelation function
45 1 |k| 32 1 |k|
ρ(k) = + − , k ∈ Z.
77 3 77 4
Exercises 117
2.26. Let (εt ) be a white noise with E(ε0 ) = µ, Var(ε0 ) = σ 2 and put
Yt = εt − Yt−1 , t ∈ N, Y0 = 0.
Show that √
Corr(Ys , Yt ) = (−1)s+t min{s, t}/ st.
2.27. An AR(2)-process Yt = a1 Yt−1 + a2 Yt−2 + εt satisfies the station-
arity condition (2.4), if the pair (a1 , a2 ) is in the triangle
n o
2
∆ := (α, β) ∈ R : −1 < β < 1, α + β < 1 and β − α < 1 .
2.31. Show that the value at lag 2 of the partial autocorrelation func-
tion of the MA(1)-process
Yt = εt + aεt−1 , t∈Z
is
a2
α(2) = − .
1 + a2 + a4
2.32. (Unemployed1 Data) Plot the empirical autocorrelations and
partial autocorrelations of the trend and seasonally adjusted Unem-
ployed1 Data from the building trade, introduced in Example 1.1.1.
Apply the Box–Jenkins program. Is a fit of a pure MA(q)- or AR(p)-
process reasonable?
2.36. Let (εt )t∈Z be a white noise. The process Wt = ts=1 εs is then
P
called a random walk. Generate two independent random walks µt ,
νt , t = 1, . . . , 100, where the εt are standard normal and independent.
Simulate from these
(1) (2) (3)
X t = µ t + δt , Yt = µt + δt , Zt = νt + δt ,
(i)
where the δt are again independent and standard normal, i = 1, 2, 3.
Plot the generated processes Xt , Yt and Zt and their first order dif-
ferences. Which of these series are cointegrated? Check this by the
Phillips-Ouliaris-test.
2.37. (US Interest Data) The file “us interest rates.txt” contains the
interest rates of three-month, three-year and ten-year federal bonds of
the USA, monthly collected from July 1954 to December 2002. Plot
the data and the corresponding differences of first order. Test also
whether the data are I(1). Check next if the three series are pairwise
cointegrated.
(iii) Evaluate E(Yt4 ) and deduce that E(Z14 )a21 < 1 is a sufficient
condition for E(Yt4 ) < ∞.
2.43. (Zurich Data) The daily value of the Zurich stock index was
recorded between January 1st, 1988 and December 31st, 1988. Use
a difference filter of first order to remove a possible trend. Plot the
(trend-adjusted) data, their squares, the pertaining partial autocor-
relation function and parameter estimates. Can the squared process
be considered as an AR(1)-process?
2.44. Show that the matrix Σ0 −1 in Example 2.3.1 has the determi-
nant 1 − a2 .
2.45. (Car Data) Apply the Box–Jenkins program to the Car Data.
Chapter
State-Space Models
In state-space models we have, in general, a nonobservable target
process (Xt ) and an observable process (Yt ). They are linked by the
assumption that (Yt ) is a linear function of (Xt ) with an added noise,
3
where the linear function may vary in time. The aim is the derivation
of best linear estimates of Xt , based on (Ys )s≤t .
Yt = C t X t + η t ∈ R m . (3.2)
We assume that (At ), (Bt ) and (Ct ) are sequences of known matrices,
(εt ) and (ηt ) are uncorrelated sequences of white noises with mean
vectors 0 and known covariance matrices Cov(εt ) = E(εt εTt ) =: Qt ,
Cov(ηt ) = E(ηt ηtT ) =: Rt .
We suppose further that X0 and εt , ηt , t ≥ 1, are uncorrelated, where
two random vectors W ∈ Rp and V ∈ Rq are said to be uncorrelated
if their components are i.e., if the matrix of their covariances vanishes
Yt := µt + ηt
and put
1 b
A :=
0 1
From the recursion µt+1 = µt + b we then obtain the state equation
µt+1 1 b µt
Xt+1 = = = AXt ,
1 0 1 1
and with
C := (1, 0)
the observation equation
µ
Yt = (1, 0) t + ηt = CXt + ηt .
1
Yt = a1 Yt−1 + · · · + ap Yt−p + εt
B := (1, 0, . . . , 0)T =: C T ,
Yt = CXt .
Yt = εt + b1 εt−1 + · · · + bq εt−q
the (q + 1) × 1-matrix
B := (1, 0, . . . , 0)T
124 State-Space Models
C := (1, b1 , . . . , bq )
Yt = CXt .
Example 3.1.4. Combining the above results for AR(p) and MA(q)-
processes, we obtain a state-space representation of ARMA(p, q)-pro-
cesses
the (p + q) × 1-matrix
B := (1, 0, . . . , 0, 1, 0, . . . , 0)T
with the entry 1 at the first and (p + 1)-th position and the 1 × (p + q)-
matrix
C := (1, 0, . . . , 0).
3.2 The Kalman-Filter 125
since in the second-to-last line the final term is nonnegative and the
second one vanishes by the property (3.5).
Let now X̂t−1 be a linear prediction of Xt−1 fulfilling (3.5) based on
the observations Y1 , . . . , Yt−1 . Then
Ỹt := Ct X̃t
3.2 The Kalman-Filter 127
Proof. The matrix Kt has to be chosen such that Xt − X̂t and Ys are
uncorrelated for s = 1, . . . , t, i.e.,
0 = E((Xt − X̂t )YsT ) = E((Xt − X̃t − Kt (Yt − Ỹt ))YsT ), s ≤ t.
128 State-Space Models
But this is the assertion of Lemma 3.2.2. Note that Ỹt is a linear
combination of Y1 , . . . , Yt−1 and that ηt and Xt − X̃t are uncorrelated.
The recursion in the discrete Kalman filter is done in two steps: From
ˆ t−1 one computes in the prediction step first
X̂t−1 and ∆
X̃t = At−1 X̂t−1 ,
Ỹt = Ct X̃t ,
˜ t = At−1 ∆
∆ ˆ t−1 ATt−1 + Bt−1 Qt Bt−1
T
. (3.9)
In the updating step one then computes Kt and the updated values
ˆt
X̂t , ∆
˜ t CtT (Ct ∆
Kt = ∆ ˜ t CtT + Rt )−1 ,
X̂t = X̃t + Kt (Yt − Ỹt ),
ˆt = ∆
∆ ˜ t − Kt Ct ∆
˜ t. (3.10)
An obvious problem is the choice of the initial values X̃1 and ∆ ˜ 1 . One
frequently puts X̃1 = 0 and ∆ ˜ 1 as the diagonal matrix with constant
entries σ 2 > 0. The number σ 2 reflects the degree of uncertainty about
the underlying model. Simulations as well as theoretical results show,
however, that the estimates X̂t are often not affected by the initial
values X̃1 and ∆ ˜ 1 if t is large, see for instance Example 3.2.3 below.
If in addition we require that the state-space model (3.1), (3.2) is
completely determined by some parametrization ϑ of the distribution
of (Yt ) and (Xt ), then we can estimate the matrices of the Kalman
filter in (3.9) and (3.10) under suitable conditions by a maximum
likelihood estimate of ϑ; see e.g. Brockwell and Davis (2002, Section
8.5) or Janacek and Swift (1993, Section 4.5).
By iterating the 1-step prediction X̃t = At−1 X̂t−1 of Xt in (3.6)
h times, we obtain the h-step prediction of the Kalman filter
X̃t+h := At+h−1 X̃t+h−1 , h ≥ 1,
with the initial value X̃t+0 := X̂t . The pertaining h-step prediction
of Yt+h is then
Ỹt+h := Ct+h X̃t+h , h ≥ 1.
Example 3.2.3. Let (ηt ) be a white noise in R with E(ηt ) = 0,
E(ηt2 ) = σ 2 > 0 and put for some µ ∈ R
Yt := µ + ηt , t ∈ Z.
130 State-Space Models
Note that all these values are in R. The h-step predictions X̃t+h , Ỹt+h
are, therefore, given by X̃t+1 = X̂t . The update step (3.10) of the
Kalman filter is
∆t−1
Kt =
∆t−1 + σ 2
X̂t = X̂t−1 + Kt (Yt − X̂t−1 )
ˆt = ∆
ˆ t−1 − Kt ∆
ˆ t−1 = ∆
ˆ t−1 σ2
∆ .
∆t−1 + σ 2
ˆ t = E((Xt − X̂t )2 ) ≥ 0 and thus,
Note that ∆
ˆt = ∆
ˆ t−1 σ2 ˆ t−1
0≤∆ ≤∆
∆t−1 + σ 2
ˆt
is a decreasing and bounded sequence. Its limit ∆ := limt→∞ ∆
consequently exists and satisfies
σ2
∆=∆
∆ + σ2
i.e., ∆ = 0. This means that the mean squared error E((Xt − X̂t )2 ) =
E((µ − X̂t )2 ) vanishes asymptotically, no matter how the initial values
X̃1 and ∆ ˜ 1 are chosen. Further we have limt→∞ Kt = 0, which means
that additional observations Yt do not contribute to X̂t if t is large.
Finally, we obtain for the mean squared error of the h-step prediction
Ỹt+h of Yt+h
Example 3.2.4. The following figure displays the Airline Data from
Example 1.3.1 together with 12-step forecasts based on the Kalman
filter. The original data yt , t = 1, . . . , 144 were log-transformed xt =
log(yt ) to stabilize the variance; first order differences ∆xt = xt − xt−1
were used to eliminate the trend and, finally, zt = ∆xt − ∆xt−12
were computed to remove the seasonal component of 12 months. The
Kalman filter was applied to forecast zt , t = 145, . . . , 156, and the
results were transformed in the reverse order of the preceding steps
to predict the initial values yt , t = 145, . . . , 156.
Plot 3.2.1: Airline Data and predicted values using the Kalman filter.
1 /* airline_kalman.sas */
2 TITLE1 ’Original and Forecasted Data’;
3 TITLE2 ’Airline Data’;
4
5 /* Read in the data and compute log-transformation */
6 DATA data1;
7 INFILE ’c:\data\airline.txt’;
8 INPUT y;
9 yl=LOG(y);
10 t=_N_;
11
132 State-Space Models
26 /* Graphical options */
27 LEGEND1 LABEL=(’’) VALUE=(’original’ ’forecast’);
28 SYMBOL1 C=BLACK V=DOT H=0.7 I=JOIN L=1;
29 SYMBOL2 C=BLACK V=CIRCLE H=1.5 I=JOIN L=1;
30 AXIS1 LABEL=(ANGLE=90 ’Passengers’);
31 AXIS2 LABEL=(’January 1949 to December 1961’);
32
33 /* Plot data and forecasts */
34 PROC GPLOT DATA=data4;
35 PLOT y*t=1 yhat*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
36 RUN; QUIT;
In the first data step the Airline Data are read tified by the time index set to t. The results are
into data1. Their logarithm is computed and stored in the data set data2 with forecasts of
stored in the variable yl. The variable t con- 12 months after the end of the input data. This
tains the observation number. is invoked by LEAD=12.
The statement VAR yl(1,12) of the PROC data3 contains the exponentially trans-
STATESPACE procedure tells SAS to use first formed forecasts, thereby inverting the log-
order differences of the initial data to remove transformation in the first data step.
their trend and to adjust them to a seasonal Finally, the two data sets are merged and dis-
component of 12 months. The data are iden- played in one plot.
Exercises
3.1. Consider the two state-space models
Xt+1 = At Xt + Bt εt+1
Yt = C t X t + η t
Exercises 133
and
where (εTt , ηtT , ε̃Tt , η̃tT )T is a white noise. Derive a state-space repre-
sentation for (YtT , ỸtT )T .
3.4. Show that εt and Ys are uncorrelated and that ηt and Ys are
uncorrelated if s < t.
3.6. (Gas Data) Apply PROC STATESPACE to the gas data. Can
they be stationary? Compute the one-step predictors and plot them
together with the actual data.
134 State-Space Models
Chapter
The Frequency Domain
Approach of a Time
Series
4
The preceding sections focussed on the analysis of a time series in the
time domain, mainly by modelling and fitting an ARMA(p, q)-process
to stationary sequences of observations. Another approach towards
the modelling and analysis of time series is via the frequency domain:
A series is often the sum of a whole variety of cyclic components, from
which we had already added to our model (1.2) a long term cyclic one
or a short term seasonal one. In the following we show that a time
series can be completely decomposed into cyclic components. Such
cyclic components can be described by their periods and frequencies.
The period is the interval of time required for one cycle to complete.
The frequency of a cycle is its number of occurrences during a fixed
time unit; in electronic media, for example, frequencies are commonly
measured in hertz , which is the number of cycles per second, abbre-
viated by Hz. The analysis of a time series in the frequency domain
aims at the detection of such cycles and the computation of their
frequencies.
Note that in this chapter the results are formulated for any data
y1 , . . . , yn , which need for mathematical reasons not to be generated
by a stationary process. Nevertheless it is reasonable to apply the
results only to realizations of stationary processes, since the empiri-
cal autocovariance function occurring below has no interpretation for
non-stationary processes, see Exercise 1.21.
136 The Frequency Domain Approach of a Time Series
Model: MODEL1
Dependent Variable: LUMEN
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Prob > |T|
138 The Frequency Domain Approach of a Time Series
where n
X
cij = mi (t)mj (t).
t=1
If c11 c22 − c12 c21 6= 0, the uniquely determined pair of solutions A, B
of these equations is
c22 C(λ) − c12 S(λ)
A = A(λ) = n
c11 c22 − c12 c21
c21 C(λ) − c11 S(λ)
B = B(λ) = n ,
c12 c21 − c11 c22
140 The Frequency Domain Approach of a Time Series
where
n
1X
C(λ) := (yt − ȳ) cos(2πλt),
n t=1
n
1X
S(λ) := (yt − ȳ) sin(2πλt) (4.1)
n t=1
A = 2C(λ), B = 2S(λ).
X n k m n, k = m = 0 or = n/2
cos 2π t cos 2π t = n/2, k = m 6= 0 and =
6 n/2
t=1
n n
0, k 6= m
X n k m 0, k = m = 0 or n/2
sin 2π t sin 2π t = n/2, k = m 6= 0 and =
6 n/2
t=1
n n
0, k 6= m
X n k m
cos 2π t sin 2π t = 0.
t=1
n n
4.1 Least Squares Approach with Known Frequencies 141
(sin(2π(k/n)t))1≤t≤n , k = 1, . . . , [n/2],
and
(cos(2π(k/n)t))1≤t≤n , k = 0, . . . , [n/2],
span the space Rn . Note that by Lemma 4.1.2 in the case of n odd
the above 2[n/2] + 1 = n vectors are linearly independent, precisely,
they are orthogonal, whereas in the case of an even sample size n the
vector (sin(2π(k/n)t))1≤t≤n with k = n/2 is the null vector (0, . . . , 0)
and the remaining n vectors are again orthogonal. As a consequence
we obtain that for a given set of data y1 , . . . , yn , there exist in any
case uniquely determined coefficients Ak and Bk , k = 0, . . . , [n/2],
with B0 := 0 and, if n is even, Bn/2 = 0 as well, such that
[n/2] k k
X
yt = Ak cos 2π t + Bk sin 2π t , t = 1, . . . , n. (4.2)
n n
k=0
and, thus,
y T vi
ck = , k = 1, . . . , n.
||vi ||2
142 The Frequency Domain Approach of a Time Series
and
n [n/2]
X
2 nX 2
Ak + Bk2 .
(yt − ȳ) =
t=1
2
k=1
(i) I(0) = 0,
Proof. Part (i) follows from sin(0) = 0 and cos(0) = 1, while (ii) is a
consequence of cos(−x) = cos(x), sin(−x) = − sin(x), x ∈ R. Part
(iii) follows from the fact that 2π is the fundamental period of sin and
cos.
Theorem 4.2.1 implies that the function I(λ) is completely determined
by its values on [0, 0.5]. The periodogram is often defined on the scale
[0, 2π] instead of [0, 1] by putting I ∗ (ω) := 2I(ω/(2π)); this version is,
for example, used in SAS. In view of Theorem 4.2.4 below we prefer
I(λ), however.
The following figure displays the periodogram of the Star Data from
Example 4.1.1. It has two obvious peaks at the Fourier frequencies
21/600 = 0.035 ≈ 1/28.57 and 25/600 = 1/24 ≈ 0.04167. This
indicates that essentially two cycles with period 24 and 28 or 29 are
inherent in the data. A least squares approach for the determination of
the coefficients Ai , Bi , i = 1, 2 with frequencies 1/24 and 1/29 as done
in Program 4.1.1 (star harmonic.sas) then leads to the coefficients in
Example 4.1.1.
--------------------------------------------------------------
COS_01
34.1933
--------------------------------------------------------------
PERIOD COS_01 SIN_01 P LAMBDA
The first part of the output is the coefficient Ã0 which is equal to
two times the mean of the lumen data. The results for the Fourier
frequencies with the six greatest periodogram values constitute the
second part of the output. Note that the meaning of COS 01 and
SIN 01 are slightly different from the definitions of Ak and Bk in
(4.3), because SAS lets the index run from 0 to n − 1 instead of 1 to
n.
(ii) fa (−λ) and fa (λ) are conjugate complex numbers i.e., fa (−λ) =
fa (λ),
1 /* bankruptcy_correlogram.sas */
2 TITLE1 ’Correlogram’;
3 TITLE2 ’Bankruptcy Data’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\bankrupt.txt’;
8 INPUT year bankrupt;
9
10 /* Compute autocorrelation function */
11 PROC ARIMA DATA=data1;
12 IDENTIFY VAR=bankrupt NLAG=64 OUTCOV=corr NOPRINT;
13
14 /* Graphical options */
15 AXIS1 LABEL=(’r(k)’);
16 AXIS2 LABEL=(’k’);
17 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.4 W=1;
18
19 /* Plot auto correlation function */
20 PROC GPLOT DATA=corr;
21 PLOT CORR*LAG / VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
22 RUN; QUIT;
After reading the data from an external file stores them into a new data set. The correlo-
into a data step, the procedure ARIMA calcu- gram is generated using PROC GPLOT.
lates the empirical autocorrelation function and
4.2 The Periodogram 149
1 /* bankruptcy_periodogram.sas */
2 TITLE1 ’Periodogram’;
3 TITLE2 ’Bankruptcy Data’;
4
5 /* Read in the data */
6 DATA data1;
7 INFILE ’c:\data\bankrupt.txt’;
8 INPUT year bankrupt;
9
10 /* Compute the periodogram */
11 PROC SPECTRA DATA=data1 P OUT=data2;
12 VAR bankrupt;
13
14 /* Adjusting different periodogram definitions */
15 DATA data3;
16 SET data2(FIRSTOBS=2);
17 p=P_01/2;
150 The Frequency Domain Approach of a Time Series
18 lambda=FREQ/(2*CONSTANT(’PI’));
19
20 /* Graphical options */
21 SYMBOL1 V=NONE C=GREEN I=JOIN;
22 AXIS1 LABEL=(’I’ F=CGREEK ’(l)’) ;
23 AXIS2 ORDER=(0 TO 0.5 BY 0.05) LABEL=(F=CGREEK ’l’);
24
25 /* Plot the periodogram */
26 PROC GPLOT DATA=data3;
27 PLOT p*lambda / VAXIS=AXIS1 HAXIS=AXIS2;
28 RUN; QUIT;
This program again first reads the data formations of the periodogram and the fre-
and then starts a spectral analysis by quency values generated by PROC SPECTRA
PROC SPECTRA. Due to the reasons men- done in data3. The graph results from the
tioned in the comments to Program 4.2.1 statements in PROC GPLOT.
(star periodogram.sas) there are some trans-
n−1
X
I(λ) = c(0) + 2 c(k) cos(2πλk)
k=1
n−1
X
= c(k)e−i2πλk .
k=−(n−1)
x2 ) for x1 , x2 ∈ R we obtain
n n
1 XX
I(λ) = (ys − ȳ)(yt − ȳ)
n s=1 t=1
× cos(2πλs) cos(2πλt) + sin(2πλs) sin(2πλt)
n n
1 XX
= ast ,
n s=1 t=1
where ast := (ys − ȳ)(yt − ȳ) cos(2πλ(s − t)). Since ast = ats and
cos(0) = 1 we have moreover
n n−1 n−k
1X 2 XX
I(λ) = att + ajj+k
n t=1 n j=1 k=1
n n−1 Xn−k
1X 2
X 1
= (yt − ȳ) + 2 (yj − ȳ)(yj+k − ȳ) cos(2πλk)
n t=1 n j=1
k=1
n−1
X
= c(0) + 2 c(k) cos(2πλk).
k=1
since (
Z 1
1, if s = t
ei2πλ(t−s) dλ =
0 0, if s 6= t.
Aliasing
Suppose that we observe a continuous time process (Zt )t∈R only through
its values at k∆, k ∈ Z, where ∆ > 0 is the sampling interval,
i.e., we actually observe (Yk )k∈Z = (Zk∆ )k∈Z . Take, for example,
Zt := cos(2π(9/11)t), t ∈ R. The following figure shows that at
k ∈ Z, i.e., ∆ = 1, the observations Zk coincide with Xk , where
Xt := cos(2π(2/11)t), t ∈ R. With the sampling interval ∆ = 1,
the observations Zk with high frequency 9/11 can, therefore, not be
distinguished from the Xk , which have low frequency 2/11.
154 The Frequency Domain Approach of a Time Series
1 /* aliasing.sas */
2 TITLE1 ’Aliasing’;
3
4 /* Generate harmonic waves */
5 DATA data1;
6 DO t=1 TO 14 BY .01;
7 y1=COS(2*CONSTANT(’PI’)*2/11*t);
8 y2=COS(2*CONSTANT(’PI’)*9/11*t);
9 OUTPUT;
10 END;
11
12 /* Generate points of intersection */
13 DATA data2;
14 DO t0=1 TO 14;
15 y0=COS(2*CONSTANT(’PI’)*2/11*t0);
16 OUTPUT;
17 END;
18
19 /* Merge the data sets */
20 DATA data3;
21 MERGE data1 data2;
22
23 /* Graphical options */
24 SYMBOL1 V=DOT C=GREEN I=NONE H=.8;
25 SYMBOL2 V=NONE C=RED I=JOIN;
26 AXIS1 LABEL=NONE;
27 AXIS2 LABEL=(’t’);
Exercises 155
28
29 /* Plot the curves with point of intersection */
30 PROC GPLOT DATA=data3;
31 PLOT y0*t0=1 y1*t=2 y2*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
32 RUN; QUIT;
In the first data step a tight grid for the cosine After merging the two data sets the two waves
waves with frequencies 2/11 and 9/11 is gen- are plotted using the JOIN option in the
erated. In the second data step the values of SYMBOL statement while the values at the ob-
the cosine wave with frequency 2/11 are gen- servation points are displayed in the same
erated just for integer values of t symbolizing graph by dot symbols.
the observation points.
Exercises
4.1. Let y(t) = A cos(2πλt) + B sin(2πλt) be a harmonic component.
Show that y can be written as y(t) = α cos(2πλt − ϕ), where α is the
amplitiude, i.e., the maximum departure of the wave from zero and ϕ
is the phase displacement.
4.2. Show that
n
(
X n, λ∈Z
cos(2πλt) =
t=1
cos(πλ(n + 1)) sin(πλn)
sin(πλ) , λ 6∈ Z
n
(
X 0, λ∈Z
sin(2πλt) =
t=1
sin(πλ(n + 1)) sin(πλn)
sin(πλ) , λ 6∈ Z.
156 The Frequency Domain Approach of a Time Series
Hint: Compute nt=1 ei2πλt , where eiϕ = cos(ϕ) + i sin(ϕ) is the com-
P
plex valued exponential function.
4.3. Verify Lemma 4.1.2. Hint: Exercise 4.2.
4.4. Suppose that the time series (yt )t satisfies the additive model
with seasonal component
s
X k X s k
s(t) = Ak cos 2π t + Bk sin 2π t .
s s
k=1 k=1
4.7. (Unemployed1 Data) Plot the periodogram of the first order dif-
ferences of the numbers of unemployed in the building trade as intro-
duced in Example 1.1.1.
4.8. (Airline Data) Plot the periodogram of the variance stabilized
and trend adjusted Airline Data, introduced in Example 1.3.1. Add
a seasonal adjustment and compare the periodograms.
4.9. The contribution of the autocovariance c(k), k ≥ 1, to the pe-
riodogram can be illustrated by plotting the functions ± cos(2πλk),
λ ∈ [0.5].
Exercises 157
4.13. (Star Data) Suppose that the Star Data are only observed
weekly (i.e., keep only every seventh observation). Is an aliasing effect
observable?
Chapter
The Spectrum of a
Stationary Process
In this chapter we investigate the spectrum of a real valued station-
5
ary process, which is the Fourier transform of its (theoretical) auto-
covariance function. Its empirical counterpart, the periodogram, was
investigated in the preceding sections, cf. Theorem 4.2.3.
Let (Yt )t∈Z be a (real valued) stationary process with an absolutely
summable autocovariance function γ(t), t ∈ Z. Its Fourier transform
X X
−i2πλt
f (λ) := γ(t)e = γ(0) + 2 γ(t) cos(2πλt), λ ∈ R,
t∈Z t∈N
For t = 0 we obtain Z 1
γ(0) = f (λ) dλ,
0
which shows that the spectrum is a decomposition of the variance
γ(0). In Section 5.3 we will in particular compute the spectrum of
an ARMA-process. As a preparatory step we investigate properties
of spectra for arbitrary absolutely summable filters.
160 The Spectrum of a Stationary Process
zero and covariance matrix K (n) . Define now for each n ∈ N and t ∈ Z
a distribution function on Rn by
Ft+1,...,t+n (v1 , . . . , vn ) := P {V1 ≤ v1 , . . . , Vn ≤ vn }.
This defines a family of distribution functions indexed by consecutive
integers. Let now t1 < · · · < tm be arbitrary integers. Choose t ∈ Z
and n ∈ N such that ti = t + ni , where 1 ≤ n1 < · · · < nm ≤ n. We
define now
Ft1 ,...,tm ((vi )1≤i≤m ) := P {Vni ≤ vi , 1 ≤ i ≤ m}.
Note that Ft1 ,...,tm does not depend on the special choice of t and n
and thus, we have defined a family of distribution functions indexed
by t1 < · · · < tm on Rm for each m ∈ N, which obviously satisfies the
consistency condition of Kolmogorov’s theorem. This result implies
the existence of a process (Vt )t∈Z , whose finite dimensional distribution
at t1 < · · · < tm has distribution function Ft1 ,...,tm . This process
has, therefore, mean vector zero and covariances E(Vt+h Vt ) = K(h),
h ∈ Z.
Rλ
0 f (x) dx for 0 ≤ λ ≤P1, then f is called the spectral density of γ.
Note that the property h≥0 |γ(h)| < ∞ already implies the existence
of a spectral density of γ, cf. Theorem 4.2.5 and the proof of Corollary
5.1.5. R1
Recall that γ(0) = 0 dF (λ) = F (1) and thus, the autocorrelation
function ρ(h) = γ(h)/γ(0) has the above integral representation, but
with F replaced by the probability distribution function F/γ(0).
Proof of Theorem 5.1.2. We establish first the uniqueness of F . Let
G be another measure generating function with G(λ) = 0 for λ ≤ 0
and constant for λ ≥ 1 such that
Z 1 Z 1
i2πλh
γ(h) = e dF (λ) = ei2πλh dG(λ), h ∈ Z.
0 0
Suppose now that γ has the representation (5.3). We have for arbi-
trary xi ∈ R, i = 1, . . . , n
X Z 1 X
xr γ(r − s)xs = xr xs ei2πλ(r−s) dF (λ)
1≤r,s≤n 0 1≤r,s≤n
Z 1Xn 2
i2πλr
= xr e dF (λ) ≥ 0,
0 r=1
Put now Z λ
FN (λ) := fN (x) dx, 0 ≤ λ ≤ 1.
0
Then we have for each h ∈ Z
Z 1 Z 1
i2πλh
X |m|
e dFN (λ) = 1− γ(m) ei2πλ(h−m) dλ
0 N 0
|m|<N
(
1 − |h|
N γ(h), if |h| < N
= (5.4)
0, if |h| ≥ N .
Since FN (1) = γ(0) < ∞ for any N ∈ N, we can apply Helly’s selec-
tion theorem (cf. Billingsley, 1968, page 226ff) to deduce the existence
of a measure generating function F̃ and a subsequence (FNk )k such
that FNk converges weakly to F̃ i.e.,
Z 1 Z 1
k→∞
g(λ) dFNk (λ) −→ g(λ) dF̃ (λ)
0 0
Proof. Theorem 5.1.2 shows that (i) and (ii) are equivalent. The
assertion is then a consequence of Theorem 5.1.1.
P
Corollary 5.1.5. A symmetric function γ : Z → R with t∈Z |γ(t)| <
∞ is the autocovariance function of a stationary process iff
X
f (λ) := γ(t)e−i2πλt ≥ 0, λ ∈ [0, 1].
t∈Z
(see Exercise 5.13 and Theorem 4.2.2). This power transfer function
is plotted in Plot 5.2.1 below. It shows that frequencies λ close to
zero i.e., those corresponding to a large period, remain essentially
unaltered. Frequencies λ close to 0.5, which correspond to a short
period, are, however, damped by the approximate factor 0.1, when the
moving average (au ) is applied to a process. The frequency λ = 1/3
is completely eliminated, since ga (1/3) = 0.
5.2 Linear Filters and Frequencies 169
1 /* power_transfer_sma3.sas */
2 TITLE1 ’Power transfer function’;
3 TITLE2 ’of the simple moving average of length 3’;
4
5 /* Compute power transfer function */
6 DATA data1;
7 DO lambda=.001 TO .5 BY .001;
8 g=(SIN(3*CONSTANT(’PI’)*lambda)/(3*SIN(CONSTANT(’PI’)*lambda)))
,→**2;
9 OUTPUT;
10 END;
11
12 /* Graphical options */
13 AXIS1 LABEL=(’g’ H=1 ’a’ H=2 ’(’ F=CGREEK ’l)’);
14 AXIS2 LABEL=(F=CGREEK ’l’);
15 SYMBOL1 V=NONE C=GREEN I=JOIN;
16
17 /* Plot power transfer function */
18 PROC GPLOT DATA=data1;
19 PLOT g*lambda / VAXIS=AXIS1 HAXIS=AXIS2;
20 RUN; QUIT;
170 The Spectrum of a Stationary Process
fa (λ) = 1 − e−i2πλ .
Since
fa (λ) = e−iπλ eiπλ − e−iπλ = ie−iπλ 2 sin(πλ),
Plot 5.2.2: Power transfer function of the first order difference filter.
Since sin2 (x) = 0 iff x = kπ and sin2 (x) = 1 iff x = (2k + 1)π/2,
k ∈ Z, the power transfer function ga(s) (λ) satisfies for k ∈ Z
(
0, iff λ = k/s
ga(s) (λ) =
4 iff λ = (2k + 1)/(2s).
where λ0 is the cut off frequency in the first two cases and [λ0 −
∆, λ0 + ∆] is the cut off interval with bandwidth 2∆ > 0 in the final
one. Therefore, the question naturally arises, whether there actually
exist filters, which have a prescribed power transfer function. One
possible approach for fitting a linear filter with weights au to a given
transfer function f is offered by utilizing least squares. Since only
filters of finite length matter in applications, one chooses a transfer
function s
X
fa (λ) = au e−i2πλu
u=r
with fixed integers r, s and fits this function fa to f by minimizing
the integrated squared error
Z 0.5
|f (λ) − fa (λ)|2 dλ
0
in (au )r≤u≤s ∈ Rs−r+1 . This is achieved for the choice (Exercise 5.16)
Z 0.5
i2πλu
au = 2 Re f (λ)e dλ , u = r, . . . , s,
0
Example 5.2.5. For the low pass filter with cut off frequency 0 <
λ0 < 0.5 and ideal transfer function
Plot 5.2.4: Transfer function of least squares fitted low pass filter with
cut off frequency λ0 = 1/10 and r = −20, s = 20.
1 /* transfer.sas */
2 TITLE1 ’Transfer function’;
3 TITLE2 ’of least squares fitted low pass filter’;
4
5 /* Compute transfer function */
6 DATA data1;
7 DO lambda=0 TO .5 BY .001;
8 f=2*1/10;
5.3 Spectral Density of an ARMA-Process 175
9 DO u=1 TO 20;
10 f=f+2*1/(CONSTANT(’PI’)*u)*SIN(2*CONSTANT(’PI’)*1/10*u)*COS(2*
,→CONSTANT(’PI’)*lambda*u);
11 END;
12 OUTPUT;
13 END;
14
15 /* Graphical options */
16 AXIS1 LABEL=(’f’ H=1 ’a’ H=2 F=CGREEK ’(l)’);
17 AXIS2 LABEL=(F=CGREEK ’l’);
18 SYMBOL1 V=NONE C=GREEN I=JOIN L=1;
19
20 /* Plot transfer function */
21 PROC GPLOT DATA=data1;
22 PLOT f*lambda / VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
23 RUN; QUIT;
The programs in Section 5.2 (Linear Filters and g by a DO loop over lambda from 0 to 0.5 with
Frequencies) are just made for the purpose a small increment. In Program 5.2.4 (trans-
of generating graphics, which demonstrate the fer.sas) it is necessary to use a second DO loop
shape of power transfer functions or, in case of within the first one to calculate the sum used in
Program 5.2.4 (transfer.sas), of a transfer func- the definition of the transfer function f.
tion. They all consist of two parts, a DATA step Two AXIS statements defining the axis labels
and a PROC step. and a SYMBOL statement precede the proce-
In the DATA step values of the power transfer dure PLOT, which generates the plot of g or f
function are calculated and stored in a variable versus lambda.
variance σ 2 . Put
A(z) := 1 − a1 z − a2 z 2 − · · · − ap z p ,
B(z) := 1 + b1 z + b2 z 2 + · · · + bq z q
and suppose that the process (Yt ) satisfies the stationarity condition
(2.4), i.e., the roots of the equation A(z) = 0 are outside of the unit
circle. The process (Yt ) then has the spectral density
−i2πλ 2
Pq
2 |B(e )| 2 |1 + v=1 bv e−i2πλv |2
fY (λ) = σ =σ . (5.7)
|A(e−i2πλ )|2 |1 − pu=1 au e−i2πλu |2
P
Yt = aYt−1 + εt + bεt−1
1 + 2b cos(2πλ) + b2
2
fY (λ) = σ .
1 − 2a cos(2πλ) + a2
1 /* arma11_sd.sas */
2 TITLE1 ’Spectral densities of ARMA(1,1)-processes’;
3
4 /* Compute spectral densities of ARMA(1,1)-processes */
5 DATA data1;
6 a=.5;
7 DO b=-.9, -.2, 0, .2, .5;
8 DO lambda=0 TO .5 BY .005;
9 f=(1+2*b*COS(2*CONSTANT(’PI’)*lambda)+b*b)/(1-2*a*COS(2*CONSTANT
,→(’PI’)*lambda)+a*a);
10 OUTPUT;
11 END;
12 END;
13
14 /* Graphical options */
15 AXIS1 LABEL=(’f’ H=1 ’Y’ H=2 F=CGREEK ’(l)’);
16 AXIS2 LABEL=(F=CGREEK ’l’);
17 SYMBOL1 V=NONE C=GREEN I=JOIN L=4;
18 SYMBOL2 V=NONE C=GREEN I=JOIN L=3;
19 SYMBOL3 V=NONE C=GREEN I=JOIN L=2;
20 SYMBOL4 V=NONE C=GREEN I=JOIN L=33;
21 SYMBOL5 V=NONE C=GREEN I=JOIN L=1;
22 LEGEND1 LABEL=(’a=0.5, b=’);
23
Exercises
5.1. Formulate and prove Theorem 5.1.1 for Hermitian functions K
and complex-valued stationary processes. Hint for the sufficiency part:
Let K1 be the real part and K2 be the imaginary part of K. Consider
the real-valued 2n × 2n-matrices
!
(n) (n)
1 K1 K2 (n)
M (n) =
, K = K l (r − s) , l = 1, 2.
2 −K2(n) K1(n) l 1≤r,s≤n
distributions
Ft+1,...,t+n (v1 , w1 , . . . , vn , wn )
:= P {V1 ≤ v1 , W1 ≤ w1 , . . . , Vn ≤ vn , Wn ≤ wn },
t ∈ Z. By Kolmogorov’s theorem there exists a bivariate Gaussian
process (Vt , Wt )t∈Z with mean vector zero and covariances
1
E(Vt+h Vt ) = E(Wt+h Wt ) = K1 (h)
2
1
E(Vt+h Wt ) = − E(Wt+h Vt ) = K2 (h).
2
Conclude by showing that the complex-valued process Yt := Vt − iWt ,
t ∈ Z, has the autocovariance function K.
5.2. Suppose that A is a real positive semidefinite n × n-matrix i.e.,
xT Ax ≥ 0 for x ∈ Rn . Show that A is also positive semidefinite for
complex numbers i.e., z T Az̄ ≥ 0 for z ∈ Cn .
5.3. Use (5.3) to show that for 0 < a < 0.5
(
sin(2πah)
γ(h) = 2πh , h ∈ Z \ {0}
a, h=0
is the autocovariance function of a stationary process. Compute its
spectral density.
5.4. Compute the autocovariance function of a stationary process with
spectral density
0.5 − |λ − 0.5|
f (λ) = , 0 ≤ λ ≤ 1.
0.52
5.5. Suppose that F and G are measure generating functions defined
on some interval [a, b] with F (a) = G(a) = 0 and
Z Z
ψ(x) F (dx) = ψ(x) G(dx)
[a,b] [a,b]
5.7. A real valued stationary process (Yt )t∈Z is supposed to have the
spectral density f (λ) = a + bλ, λ ∈ [0, 0.5]. Which conditions must be
satisfied by a and b? Compute the autocovariance function of (Yt )t∈Z .
5.9. Suppose that (Yt )t∈Z and (Zt )t∈Z are stationary processes such
that Yr and Zs are uncorrelated for arbitrary r, s ∈ Z. Denote by FY
and FZ the pertaining spectral distribution functions and put Xt :=
Yt + Zt , t ∈ Z. Show that the process (Xt ) is also stationary and
compute its spectral distribution function.
5.10. Let (Yt ) be a real valued stationary process with spectral dis-
R 1 function F2 . Show that for any function g : [−0.5, 0.5] → C
tribution
with 0 |g(λ − 0.5)| dF (λ) < ∞
Z1 Z 1
g(λ − 0.5) dF (λ) = g(0.5 − λ) dF (λ).
0
0
In particular we have
Hint: Verify the equality first for g(x) = exp(i2πhx), h ∈ Z, and then
use the fact that, on compact sets, the trigonometric polynomials
are uniformly dense in the space of continuous functions, which in
turn form a dense subset in the space of square integrable functions.
Finally, consider the function g(x) = 1[0,ξ] (x), 0 ≤ ξ ≤ 0.5 (cf. the
hint in Exercise 5.5).
5.11. Let (Xt ) and (Yt ) be stationary processes with mean zero and
absolute summable covariance functions. If their spectral densities fX
and fY satisfy fX (λ) ≤ fY (λ) for 0 ≤ λ ≤ 1, show that
184 The Spectrum of a Stationary Process
(i) Γn,Y −Γn,X is a positive semidefinite matrix, where Γn,X and Γn,Y
are the covariance matrices of (X1 , . . . , Xn )T and (Y1 , . . . , Yn )T
respectively, and
(ii) Var(aT (X1 , . . . , Xn )) ≤ Var(aT (Y1 , . . . , Yn )) for all vectors a =
(a1 , . . . , an )T ∈ Rn .
5.12. Compute the gain function of the filter
1/4,
u ∈ {−1, 1}
au = 1/2, u=0
0 elsewhere.
5.13. The simple moving average
(
1/(2q + 1), u ≤ |q|
au =
0 elsewhere
has the gain function
1, λ=0
ga (λ) = sin((2q+1)πλ) 2
(2q+1) sin(πλ) , λ ∈ (0, 0.5].
Is this filter for large q a low pass filter? Plot its power transfer
functions for q = 5/10/20. Hint: Exercise 4.2.
5.14. Compute the gain function of the exponential smoothing filter
(
α(1 − α)u , u ≥ 0
au =
0, u < 0,
where 0 < α < 1. Plot this function for various α. What is the effect
of α → 0 or α → 1?
5.15. Let (Xt )t∈Z be a stationary
Pprocess, let (au )u∈Z be an absolutely
summable filter and put Yt := u∈Z au Xt−u , t ∈ Z. If (bP w )w∈Z is an-
other absolutely summable filter, then the process Zt = w∈Z bw Yt−w
has the spectral distribution function
Zλ
FZ (λ) = |Fa (µ)|2 |Fb (µ)|2 dFX (µ)
0
Exercises 185
Hint: Put f (λ) = f1 (λ) + if2 (λ) and differentiate with respect to au .
5.18. An AR(2)-process
Yt = a1 Yt−1 + a2 Yt−2 + εt
satisfying the stationarity condition (2.4) (cf. Exercise 2.25) has the
spectral density
σ2
fY (λ) = .
1 + a21 + a22 + 2(a1 a2 − a1 ) cos(2πλ) − 2a2 cos(4πλ)
Plot this function for various choices of a1 , a2 .
n
k 1X k
Cε = (εt − ε̄) cos 2π t ,
n n t=1 n
n
k 1X k
Sε = (εt − ε̄) sin 2π t
n n t=1 n
Proof. Put
Vj := Sj − Sj−1 , j = 1, . . . , m.
By Theorem 6.1.3 the vector (V1 , . . . , Vm ) is distributed like the length
of the m consecutive intervals into which [0, 1] is partitioned by the
m − 1 random points U1 , . . . , Um−1 :
Fisher’s Test
The preceding results suggest to test the hypothesis Yt = εt with εt
independent and N (µ, σ 2 )-distributed, by testing for the uniform dis-
tribution on [0, 1]. Precisely, we will reject this hypothesis if Fisher’s
κ-statistic
max1≤j≤m I(j/n)
κm := = mMm
(1/m) m
P
k=1 I(k/n)
is significantly large, i.e., if one of the values I(j/n) is significantly
larger than the average over all. The hypothesis is, therefore, rejected
at error level α if
c
α
κm > cα with 1 − Gm = α.
m
This is Fisher’s test for hidden periodicities. Common values are
α = 0.01 and = 0.05. Table 6.1.1, taken from Fuller (1995), lists
several critical values cα .
Note that these quantiles can be approximated by the corresponding
quantiles of a Gumbel distribution if m is large (Exercise 6.12).
192 Statistical Analysis in the Frequency Domain
SPECTRA Procedure
1 /* airline_whitenoise_plot.sas */
2 TITLE1 ’Visualisation of the test for white noise’;
3 TITLE2 ’for the trend und seasonal adjusted’;
4 TITLE3 ’Airline Data’;
5 /* Note that this program needs data2 generated by the previous
,→program (airline_whitenoise.sas) */
6
7 /* Calculate the sum of the periodogram */
8 PROC MEANS DATA=data2(FIRSTOBS=2) NOPRINT;
9 VAR P_01;
10 OUTPUT OUT=data3 SUM=psum;
11
12 /* Compute empirical distribution function of cumulated periodogram
,→and its confidence bands */
13 DATA data4;
14 SET data2(FIRSTOBS=2);
15 IF _N_=1 THEN SET data3;
16 RETAIN s 0;
17 s=s+P_01/psum;
18 fm=_N_/(_FREQ_-1);
19 yu_01=fm+1.63/SQRT(_FREQ_-1);
20 yl_01=fm-1.63/SQRT(_FREQ_-1);
21 yu_05=fm+1.36/SQRT(_FREQ_-1);
196 Statistical Analysis in the Frequency Domain
22 yl_05=fm-1.36/SQRT(_FREQ_-1);
23
24 /* Graphical options */
25 SYMBOL1 V=NONE I=STEPJ C=GREEN;
26 SYMBOL2 V=NONE I=JOIN C=RED L=2;
27 SYMBOL3 V=NONE I=JOIN C=RED L=1;
28 AXIS1 LABEL=(’x’) ORDER=(.0 TO 1.0 BY .1);
29 AXIS2 LABEL=NONE;
30
31 /* Plot empirical distribution function of cumulated periodogram with
,→its confidence bands */
32 PROC GPLOT DATA=data4;
33 PLOT fm*s=1 yu_01*fm=2 yl_01*fm=2 yu_05*fm=3 yl_05*fm=3 / OVERLAY
,→HAXIS=AXIS1 VAXIS=AXIS2;
34 RUN; QUIT;
This program uses the data set data2 cre- fm contains the values of the empirical distribu-
ated by Program 6.1.1 (airline whitenoise.sas), tion function calculated by means of the auto-
where the first observation belonging to the fre- matically generated variable N containing the
quency 0 is dropped. PROC MEANS calculates number of observation and the variable FREQ ,
the sum (keyword SUM) of the SAS periodogram which was created by PROC MEANS and con-
variable P 0 and stores it in the variable psum tains the number m. The values of the upper
of the data set data3. The NOPRINT option and lower band are stored in the y variables.
suppresses the printing of the output. The last part of this program contains SYMBOL
The next DATA step combines every observa- and AXIS statements and PROC GPLOT to vi-
tion of data2 with this sum by means of the sualize the Bartlett-Kolmogorov-Smirnov statis-
IF statement. Furthermore a variable s is ini- tic. The empirical distribution of the cumulated
tialized with the value 0 by the RETAIN state- periodogram is represented as a step function
ment and then the portion of each periodogram due to the I=STEPJ option in the SYMBOL1
value from the sum is cumulated. The variable statement.
1
Pn
with Ȳn := n t=1 Yt and the sample autocovariance function
n−|h|
1 X
c(h) = Yt − Ȳn Yt+|h| − Ȳn .
n t=1
X 1 n−|h|
X
In (k/n) = (Yt − µ)(Yt+|h| − µ) e−i2π(k/n)h
n t=1
|h|<n
n
1 X
= (Yt − µ)2 + (6.3)
n t=1
n−1 n−|h|
X 1 X k
2 (Yt − µ)(Yt+|h| − µ) cos 2π h .
n t=1
n
h=1
X 1 n−|h|
X
E(In (λ)) = E (Yt − µ)(Yt+|h| − µ) e−i2πgn (λ)h
n t=1
|h|<n
X |h|
= 1− γ(h)e−i2πgn (λ)h .
n
|h|<n
P P
Since h∈Z |γ(h)| < ∞, the series |h|<n γ(h) exp(−i2πλh) converges
to f (λ) uniformly for 0 ≤ λ ≤ 0.5. Kronecker’s Lemma (Exercise 6.8)
implies moreover
X |h| X |h|
−i2πλh n→∞
γ(h)e ≤ |γ(h)| −→ 0,
n n
|h|<n |h|<n
n→∞
converges to f (λ) uniformly in λ as well. From gn (λ) −→ λ and the
continuity of f we obtain for λ ∈ (0, 0.5]
| E(In (λ)) − f (λ)| = |fn (gn (λ)) − f (λ)|
≤ |fn (gn (λ)) − f (gn (λ))| + |f (gn (λ)) − f (λ)|
n→∞
−→ 0.
Note that |gn (λ) − λ| ≤ 1/(2n). The uniform convergence in case of
µ = 0 then follows from the uniform convergence of gn (λ) to λ and
the uniform continuity of f on the compact interval [0, 0.5].
200 Statistical Analysis in the Frequency Domain
We have
4
ησ , s = t = u = v
E(Zs Zt Zu Zv ) = σ 4 , s = t 6= u = v, s = u 6= t = v, s = v 6= t = u
0 elsewhere
and
j k
1,
s = t, u = v
e−i2π n (s−t) e−i2π n (u−v) = e−i2π((j+k)/n)s ei2π((j+k)/n)t , s = u, t = v
e−i2π((j−k)/n)s ei2π((j−k)/n)t , s = v, t = u.
This implies
E(In (j/n)In (k/n))
n
ησ 4 σ 4 n X 2
−i2π((j+k)/n)t
= + 2 n(n − 1) + e +
n n t=1
Xn 2 o
−i2π((j−k)/n)t
e − 2n
t=1
4 n
(η − 3)σ 4
n 1 X i2π((j+k)/n)t 2
= + σ 1 + 2 e +
n n t=1
n
1 X i2π((j−k)/n)t 2 o
e .
n2 t=1
from which (6.7) and (6.8) follow by using (4.5) on page 142.
6.2 Estimating Spectral Densities 203
Remark
P 6.2.3. Theorem 6.2.2 can be generalized to filtered processes
Yt = u∈Z au Zt−u , with (Zt )t∈Z as in Theorem 6.2.2. In this case one
has to replace σ 2 , which equals by Example 5.1.3 the constant spectral
density fPZ (λ), in (i) by the spectral density fY (λi ), 1 ≤ i ≤ r. If in
addition u∈Z |au ||u|1/2 < ∞, then we have in (ii) the expansions
(
2fY2 (k/n) + O(n−1/2 ), k = 0 or k = n/2
Var(In (k/n)) =
fY2 (k/n) + O(n−1/2 ) elsewhere,
and
Cov(In (j/n), In (k/n)) = O(n−1 ), j 6= k,
where In is the periodogram pertaining to Y1 , . . . , Yn . The above terms
O(n−1/2 ) and O(n−1 ) are uniformly bounded in k and j by a constant
C.
We omit the highly technical proof and refer to Brockwell and Davis
(1991, Section 10.3). P
Recall that the class of processes Yt = u∈Z au Zt−u is a fairly rich
one, which contains in particular ARMA-processes, see Section 2.2
and Remark 2.1.12.
2f 2 (λ), λ = µ = 0 or 0.5
ˆ ˆ
Cov fn (λ),fn (µ)
(ii) limn→∞ = f 2 (λ), 0 < λ = µ < 0.5
2
P
|j|≤m ajn
0, λ 6= µ.
6.2 Estimating Spectral Densities 205
Condition (iv) in (6.11) on the weights together with (ii) in the preced-
n→∞
ing result entails that Var(fˆn (λ)) −→ 0 for any λ ∈ [0, 0.5]. Together
with (i) we, therefore, obtain that the mean squared error of fˆn (λ)
vanishes asymptotically:
n 2 o
ˆ ˆ
MSE(fn (λ)) = E fn (λ) − f (λ)
= Var(fˆn (λ)) + Bias2 (fˆn (λ))
−→n→∞ 0.
max | E(In (gn (λ) + j/n)) − f (gn (λ) + j/n)| < ε/2
|j|≤m
Repeating the arguments in the proof of part (i) one shows that
X X
2 2 2
S1 (λ) = ajn f (λ) + o ajn .
|j|≤m |j|≤m
Thus we established the assertion of part (ii) also in the case 0 < λ =
µ < 0.5. The remaining cases λ = µ = 0 and λ = µ = 0.5 are shown
in a similar way (Exercise 6.17).
The preceding result requires zero mean variables Yt . This might, how-
ever, be too restrictive in practice. Due to (4.5), the periodograms
of (Yt )1≤t≤n , (Yt − µ)1≤t≤n and (Yt − Ȳ )1≤t≤n coincide at Fourier fre-
quencies different from zero. At frequency λ = 0, however, they
will differ in general. To estimate f (0) consistently also in the case
µ = E(Yt ) 6= 0, one puts
m
X
fˆn (0) := a0n In (1/n) + 2
ajn In (1 + j)/n . (6.12)
j=1
Each time the value In (0) occurs in the moving average (6.9), it is
replaced by fˆn (0). Since the resulting estimator of the spectral density
involves only Fourier frequencies different from zero, we can assume
without loss of generality that the underlying variables Yt have zero
mean.
1 /* sunspot_dsae.sas */
6.2 Estimating Spectral Densities 209
22 /* Graphical options */
23 SYMBOL1 I=JOIN C=RED V=NONE L=1;
24 AXIS1 LABEL=(F=CGREEK ’l’) ORDER=(0 TO .5 BY .05);
25 AXIS2 LABEL=NONE;
26
27 /* Plot periodogram and estimated spectral density */
28 PROC GPLOT DATA=data3;
29 PLOT p*lambda / HAXIS=AXIS1 VAXIS=AXIS2;
30 PLOT s*lambda / HAXIS=AXIS1 VAXIS=AXIS2;
31 RUN; QUIT;
In the DATA step the data of the sunspots are the weights given in the WEIGHTS statement.
read into the variable num. Note that SAS automatically normalizes these
Then PROC SPECTRA is applied to this vari- weights.
able, whereby the options P (see Pro- In following DATA step the slightly different def-
gram 6.1.1, airline whitenoise.sas) and S gen- inition of the periodogram by SAS is being ad-
erate a data set stored in data2 containing justed to the definition used here (see Pro-
the periodogram data and the estimation of gram 4.2.1, star periodogram.sas). Both plots
the spectral density which SAS computes with are then printed with PROC GPLOT.
n→∞
be an arbitrary sequence of integers with m/n −→ 0 and put
K(j/m)
ajn := Pm , −m ≤ j ≤ m. (6.13)
i=−m K(i/m)
These weights satisfy the conditions (6.11) (Exercise 6.18).
Take for example K(x) := 1 − |x|, −1 ≤ x ≤ 1. Then we obtain
m − |j|
ajn = , −m ≤ j ≤ m.
m2
Example 6.2.6. (i) The truncated kernel is defined by
(
1, |x| ≤ 1
KT (x) =
0 elsewhere.
1 /* ma1_blackman_tukey.sas */
2 TITLE1 ’Spectral density and Blackman-Tukey estimator’;
3 TITLE2 ’of MA(1)-process’;
4
5 /* Generate MA(1)-process */
6 DATA data1;
7 DO t=0 TO 160;
8 e=RANNOR(1);
9 y=e-.6*LAG(e);
10 OUTPUT;
11 END;
12
13 /* Estimation of spectral density */
14 PROC SPECTRA DATA=data1(FIRSTOBS=2) S OUT=data2;
15 VAR y;
16 WEIGHTS TUKEY 10 0;
17 RUN;
18
19 /* Adjusting different definitions */
20 DATA data3;
212 Statistical Analysis in the Frequency Domain
21 SET data2;
22 lambda=FREQ/(2*CONSTANT(’PI’));
23 s=S_01/2*4*CONSTANT(’PI’);
24
25 /* Compute underlying spectral density */
26 DATA data4;
27 DO l=0 TO .5 BY .01;
28 f=1-1.2*COS(2*CONSTANT(’PI’)*l)+.36;
29 OUTPUT;
30 END;
31
32 /* Merge the data sets */
33 DATA data5;
34 MERGE data3(KEEP=s lambda) data4;
35
36 /* Graphical options */
37 AXIS1 LABEL=NONE;
38 AXIS2 LABEL=(F=CGREEK ’l’) ORDER=(0 TO .5 BY .1);
39 SYMBOL1 I=JOIN C=BLUE V=NONE L=1;
40 SYMBOL2 I=JOIN C=RED V=NONE L=3;
41
42 /* Plot underlying and estimated spectral density */
43 PROC GPLOT DATA=data5;
44 PLOT f*l=1 s*lambda=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
45 RUN; QUIT;
In the first DATA step the realizations of an m = 10. The second number after the TUKEY
MA(1)-process with the given parameters are option can be used to refine the choice of the
created. Thereby the function RANNOR, which bandwidth. Since this is not needed here it is
generates standard normally distributed data, set to 0.
and LAG, which accesses the value of e of the The next DATA step adjusts the differ-
preceding loop, are used. ent definitions of the spectral density used
As in Program 6.2.1 (sunspot dsae.sas) PROC here and by SAS (see Program 4.2.1,
SPECTRA computes the estimator of the spec- star periodogram.sas). The following DATA
tral density (after dropping the first observa- step generates the values of the underlying
tion) by the option S and stores them in data2. spectral density. These are merged with the
The weights used here come from the Tukey– values of the estimated spectral density and
Hanning kernel with a specified bandwidth of then displayed by PROC GPLOT.
and
2
ν=P 2 .
|j|≤m ajn
This interval has constant length log(χ21−α/2 (ν)/χ2α/2 (ν)). Note that
Cν,α (k/n) is a level (1 − α)-confidence interval only for log(f (λ)) at
a fixed Fourier frequency λ = k/n, with 0 < k < [n/2], but not
simultaneously for λ ∈ (0, 0.5).
1 /* ma1_logdsae.sas */
2 TITLE1 ’Logarithms of spectral density,’;
3 TITLE2 ’of their estimates and confidence intervals’;
4 TITLE3 ’of MA(1)-process’;
5
6 /* Generate MA(1)-process */
7 DATA data1;
8 DO t=0 TO 160;
9 e=RANNOR(1);
10 y=e-.6*LAG(e);
11 OUTPUT;
12 END;
13
14 /* Estimation of spectral density */
15 PROC SPECTRA DATA=data1(FIRSTOBS=2) S OUT=data2;
16 VAR y; WEIGHTS 1 3 6 9 12 15 18 20 21 21 21 20 18 15 12 9 6 3 1;
17 RUN;
18
19 /* Adjusting different definitions and computation of confidence bands
,→ */
20 DATA data3; SET data2;
21 lambda=FREQ/(2*CONSTANT(’PI’));
22 log_s_01=LOG(S_01/2*4*CONSTANT(’PI’));
216 Statistical Analysis in the Frequency Domain
23 nu=2/(3763/53361);
24 c1=log_s_01+LOG(nu)-LOG(CINV(.975,nu));
25 c2=log_s_01+LOG(nu)-LOG(CINV(.025,nu));
26
27 /* Compute underlying spectral density */
28 DATA data4;
29 DO l=0 TO .5 BY 0.01;
30 log_f=LOG((1-1.2*COS(2*CONSTANT(’PI’)*l)+.36));
31 OUTPUT;
32 END;
33
34 /* Merge the data sets */
35 DATA data5;
36 MERGE data3(KEEP=log_s_01 lambda c1 c2) data4;
37
38 /* Graphical options */
39 AXIS1 LABEL=NONE;
40 AXIS2 LABEL=(F=CGREEK ’l’) ORDER=(0 TO .5 BY .1);
41 SYMBOL1 I=JOIN C=BLUE V=NONE L=1;
42 SYMBOL2 I=JOIN C=RED V=NONE L=2;
43 SYMBOL3 I=JOIN C=GREEN V=NONE L=33;
44
45 /* Plot underlying and estimated spectral density */
46 PROC GPLOT DATA=data5;
47 PLOT log_f*l=1 log_s_01*lambda=2 c1*lambda=3 c2*lambda=3 / OVERLAY
,→VAXIS=AXIS1 HAXIS=AXIS2;
48 RUN; QUIT;
This program starts identically to Program 6.2.2 lated with the help of the function CINV which
(ma1 blackman tukey.sas) with the generation returns quantiles of a χ2 -distribution with ν de-
of an MA(1)-process and of the computation grees of freedom.
the spectral density estimator. Only this time
the weights are directly given to SAS. The rest of the program which displays the
In the next DATA step the usual adjustment of logarithm of the estimated spectral density,
the frequencies is done. This is followed by the of the underlying density and of the confi-
computation of ν according to its definition. The dence intervals is analogous to Program 6.2.2
logarithm of the confidence intervals is calcu- (ma1 blackman tukey.sas).
Exercises
6.1. For independent random variables X, Y having continuous dis-
tribution functions it follows that P {X = Y } = 0. Hint: Fubini’s
theorem.
where εt are independent and standard normal. Plot the data and the
periodogram. Is the hypothesis Yt = εt rejected at level α = 0.01?
6.7. (Share Data) Test the hypothesis that the share data were gener-
ated by independent and identically normal distributed random vari-
ables and plot the periodogramm. Plot also the original data.
6.8. (Kronecker’s lemma) Let (aj )j≥0 bePan absolute summable com-
plexed valued filter. Show that limn→∞ nj=0 (j/n)|aj | = 0.
R
(ii) x2k dN (0, σ 2 )(x) = 1 · 3 · · · · · (2k − 1)σ 2k , k ∈ N.
k+1
(iii) |x|2k+1 dN (0, σ 2 )(x) = 2√2π k!σ 2k+1 , k ∈ N ∪ {0}.
R
(i) Xn + Yn →D X + c.
Exercises 219
(ii) Xn Yn →D cX.
(iii) Xn /Yn →D X/c, if c 6= 0.
This entails in particular that stochastic convergence implies conver-
gence in distribution. The reverse implication is not true in general.
Give an example.
6.12. Show that the distribution function Fm of Fisher’s test statistic
κm satisfies under the condition of independent and identically normal
observations εt
m→∞
Fm (x + ln(m)) = P {κm ≤ x + ln(m)} −→ exp(−e−x ) =: G(x), x ∈ R.
6.15. (Monte Carlo Simulation) For m large we have under the hy-
pothesis √
P { m − 1∆m−1 > cα } ≈ α.
220 Statistical Analysis in the Frequency Domain
For
√ different values of m (> 30) generate 1000 times the test statistic
m − 1∆m−1 based on independent random variables and check, how
often this statistic exceeds the critical values c0.05 = 1.36 and c0.01 =
1.63. Hint: Exercise 6.14.
6.16. In the situation of Theorem 6.2.4 show that the spectral density
f of (Yt )t is continuous.
6.17. Complete the proof of Theorem 6.2.4 (ii) for the remaining cases
λ = µ = 0 and λ = µ = 0.5.
6.18. Verify that the weights (6.13) defined via a kernel function sat-
isfy the conditions (6.11).
6.22. Compute the length of the confidence interval Cν,α (k/n) for
fixed α (preferably α = 0.05) but for various ν. For the calculation of
ν use the weights generated by the kernel K(x) = 1 − |x|, −1 ≤ x ≤ 1
(see equation (6.13)).
where
n, y∈Z
Kn (y) = 1 sin(πyn) 2
n sin(πy) , y ∈
/ Z,
(i) Kn (y) ≥ 0,
6.24. (Nile Data) Between 715 and 1284 the river Nile had its lowest
annual minimum levels. These data are among the longest time series
in hydrology. Can the trend removed Nile Data be considered as being
generated by a white noise, or are these hidden periodicities? Estimate
the spectral density in this case. Use discrete spectral estimators as
well as lag window spectral density estimators. Compare with the
spectral density of an AR(1)-process.
222 Statistical Analysis in the Frequency Domain
Chapter
The Box–Jenkins
Program: A Case Study
This chapter deals with the practical application of the Box–Jenkins
7
Program to the Donauwoerth Data, consisting of 7300 discharge mea-
surements from the Donau river at Donauwoerth, specified in cubic
centimeter per second and taken on behalf of the Bavarian State Of-
fice For Environment between January 1st, 1985 and December 31st,
2004. For the purpose of studying, the data have been kindly made
available to the University of Würzburg.
As introduced in Section 2.3, the Box–Jenkins methodology can be
applied to specify adequate ARMA(p, q)-model Yt = a1 Yt−1 + · · · +
ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q , t ∈ Z for the Donauwoerth data
in order to forecast future values of the time series. In short, the
original time series will be adjusted to represent a possible realization
of such a model. Based on the identification methods MINIC, SCAN
and ESACF, appropriate pairs of orders (p, q) are chosen in Section
7.5 and the corresponding model coefficients a1 , . . . , ap and b1 , . . . , bq
are determined. Finally, it is demonstrated by Diagnostic Checking in
Section 7.6 that the resulting model is adequate and forecasts based
on this model are executed in the concluding Section 7.7.
Yet, before starting the program in Section 7.4, some theoretical
preparations have to be carried out. In the first Section 7.1 we intro-
duce the general definition of the partial autocorrelation leading to
the Levinson–Durbin-Algorithm, which will be needed for Diagnostic
Checking. In order to verify whether pure AR(p)- or MA(q)-models
might be appropriate candidates for explaining the sample, we de-
rive in Section 7.2 and 7.3 asymptotic normal behaviors of suitable
estimators of the partial and general autocorrelations.
224 The Box–Jenkins Program: A Case Study
Partial Correlation
The partial correlation of two square integrable, real valued random
variables X and Y , holding the random variables Z1 , . . . , Zm , m ∈ N,
fixed, is defined as
provided that Var(X − X̂Z1 ,...,Zm ) > 0 and Var(Y − ŶZ1 ,...,Zm ) > 0,
where X̂Z1 ,...,Zm and ŶZ1 ,...,Zm denote best linear approximations of X
and Y based on Z1 , . . . , Zm , respectively.
Let (Yt )t∈Z be an ARMA(p, q)-process satisfying the stationarity con-
dition (2.4) with expectation E(Yt ) = 0 and variance γ(0) > 0. The
partial autocorrelation α(t, k) for k > 1 is the partial correlation
of Yt and Yt−k , where the linear influence of the intermediate vari-
ables Yi , t − k < i < t, is removed, i.e., the best linear approxima-
tion of Yt based Pk−1on the k − 1 preceding process variables, denoted
by Ŷt,k−1 := i=1 âi Yt−i . Since Ŷt,k−1 minimizes the mean squared
error E((Yt − Ỹt,a,k−1 )2 ) among all linear combinations Ỹt,a,k−1 :=
Pk−1 T k−1
i=1 ai Yt−i , a := (a1 , . . . , ak−1 ) ∈ R of Yt−k+1 , . . . , Yt−1 , we find,
Pk−1
due to the stationarity condition, that Ŷt−k,−k+1 := i=1 âi Yt−k+i is
a best linear approximation of Yt−k based on the k − 1 subsequent
process variables. Setting Ŷt,0 = 0 in the case of k = 1, we obtain, for
7.1 Partial Correlation and Levinson–Durbin Recursion 225
k > 0,
α(t, k) := Corr(Yt − Ŷt,k−1 , Yt−k − Ŷt−k,−k+1 )
Cov(Yt − Ŷt,k−1 , Yt−k − Ŷt−k,−k+1 )
= .
Var(Yt − Ŷt,k−1 )
Note that Var(Yt − Ŷt,k−1 ) > 0 is provided by the preliminary con-
ditions, which will be shown later Pby the proof of Theorem 7.1.1.
k−1
Observe moreover that Ŷt+h,k−1 = i=1 âi Yt+h−i for all h ∈ Z, imply-
ing α(t, k) = Corr(Yt − Ŷt,k−1 , Yt−k − Ŷt−k,−k+1 ) = Corr(Yk − Ŷk,k−1 , Y0 −
Ŷ0,−k+1 ) = α(k, k). Consequently the partial autocorrelation function
can be more conveniently written as
Cov(Yk − Ŷk,k−1 , Y0 − Ŷ0,−k+1 )
α(k) := (7.1)
Var(Yk − Ŷk,k−1 )
for k > 0 and α(0) = 1 for k = 0. For negative k, we set α(k) :=
α(−k).
The determination of the partial autocorrelation coefficient α(k) at lag
k > 1 entails the computation of the coefficients of the correspond-
ing best linear approximation Ŷk,k−1 , leading to an equation system
similar to the normal equations coming from a regression model. Let
Ỹk,a,k−1 = a1 Yk−1 + · · · + ak−1 Y1 be an arbitrary linear approximation
of Yk based on Y1 , . . . , Yk−1 . Then, the mean squared error is given by
E((Yk − Ỹk,a,k−1 )2 ) = E(Yk2 ) − 2 E(Yk Ỹk,a,k−1 ) + E(Ỹk,a,k−1
2
)
k−1
X
= E(Yk2 ) −2 al E(Yk Yk−l )
l=1
k−1 X
X k−1
+ E(ai aj Yk−i Yk−j )
i=1 j=1
or, respectively,
â1 ρ(1)
â2
. = P −1 ρ(2)
.. k−1 .. (7.3)
.
âk−1 ρ(k − 1)
−1
if Pk−1 is regular. A best linear approximation Ŷk,k−1 of Yk based
on Y1 , . . . , Yk−1 obviously has to share this necessary condition (7.2).
Since Ŷk,k−1 equals a best linear one-step forecast of Yk based on
Y1 , . . . , Yk−1 , Lemma 2.3.2 shows that Ŷk,k−1 = âT y is a best linear
−1
approximation of Yk . Thus, if Pk−1 is regular, then Ŷk,k−1 is given by
−1
Ŷk,k−1 = âT y = VYk y Vyy y. (7.4)
The next Theorem will now establish the necessary regularity of Pk .
7.1 Partial Correlation and Levinson–Durbin Recursion 227
(t)
This shows the boundedness of λi for a fixed i. On the other hand,
κ κ
(t) (t)
X X
γ(0) = Cov Yt+κ+1 , λi Yi ≤ |λi ||γ(t + κ + 1 − i)|.
i=1 i=1
Var(Yk − Ŷk,k−1 )
P (k − 1) := , (7.5)
Var(Yk )
if Var(Yk ) > 0. Observe that the greater the power, the less precise
the approximation Ŷk performs. Let again (Yt )t∈Z be an ARMA(p, q)-
process satisfying the stationarity condition with expectation E(Yt ) =
0 and variance Var(Yt ) > 0. Furthermore, let Ŷk,k−1 := k−1
P
u=1 âu (k −
1)Yk−u denote the best linear approximation of Yk based on the k − 1
preceding random variables for k > 1. Then, equation (7.4) and
Theorem 7.1.1 provide
−1
Var(Yk − Ŷk,k−1 ) Var(Yk − VYk y Vyy y)
P (k − 1) := =
Var(Yk ) Var(Yk )
1
−1 −1
= Var(Yk ) + Var(VYk y Vyy y) − 2 E(Yk VYk y Vyy y)
Var(Yk )
1 −1 −1 −1
= γ(0) + VYk y Vyy Vyy Vyy VyYk − 2VYk y Vyy VyYk
γ(0)
k−1
1 X
= γ(0) − VYk y â(k − 1) = 1 − ρ(i)âi (k − 1) (7.6)
γ(0) i=1
where â(k − 1) := (â1 (k − 1), â2 (k − 1), . . . , âk−1 (k − 1))T . Note that
P (k − 1) 6= 0 is provided by the proof of the previous Theorem 7.1.1.
7.1 Partial Correlation and Levinson–Durbin Recursion 229
where P (k − 1) = 1 − k−1
P
i=1 ρ(i)âi (k − 1) denotes the power of approx-
imation and where
and order k
1 ρ(1) ρ(2) ... ρ(k−1) â1 (k)
ρ(1)
ρ(1) 1 ρ(1) ρ(k−2) â2 (k) ρ(2)
ρ(2) ρ(1) 1 ρ(k−3) â3 (k)
. = ρ(3) .
.. ... .. .. ..
. . .
ρ(k−1) ρ(k−2) ρ(k−3) ... 1 âk (k) ρ(k)
230 The Box–Jenkins Program: A Case Study
and
ρ(k − 1)â1 (k) + ρ(k − 2)â2 (k) + · · · + ρ(1)âk−1 (k) + âk (k) = ρ(k).
(7.8)
−1
Multiplying Pk−1 to the left we get
â1 (k) â1 (k−1)
â
k−1 (k−1)
â2 (k) â2 (k−1) â (k−1)
âk−2
â3 (k)
= â3 (k−1)
− âk (k) k−3 (k−1) .
.. .. ..
. . .
âk−1 (k) âk−1 (k−1) â1 (k−1)
This is the central recursion equation system (ii). Applying the corre-
sponding terms of â1 (k), . . . , âk−1 (k) from the central recursion equa-
7.1 Partial Correlation and Levinson–Durbin Recursion 231
The last step follows from (7.2), which implies that Yk − Ŷk,k−1 is
uncorrelated with Y1 , . . . , Yk−1 . Setting Ŷk,k−1 = k−1
P
u=1 âu (k − 1)Yk−u ,
we attain for the numerator
Cov(Yk − Ŷk,k−1 , Y0 ) = Cov(Yk , Y0 ) − Cov(Ŷk,k−1 , Y0 )
k−1
X
= γ(k) − âu (k − 1) Cov(Yk−u , Y0 )
u=1
k−1
X
= γ(0)(ρ(k) − âu (k − 1)ρ(k − u)).
u=1
Proof. The assertion follows directly from the previous Lemma 7.1.4.
Cramer–Wold Device
Definition 7.2.1. A sequence of real valued random variables (Yn )n∈N ,
defined on some probability space (Ω, A, P), is said to be asymptoti-
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 235
cally normal with asymptotic mean µn and asymptotic variance σn2 >
D
0 for sufficiently large n, written as Yn ≈ N (µn , σn2 ), if σn−1 (Yn − µn ) →
Y as n → ∞, where Y is N(0,1)-distributed.
(iv) lim supn→∞ P ({Yn ∈ C}) ≤ P ({Y ∈ C}) for any closed set
C ⊂ Rk ,
(v) lim inf n→∞ P ({Yn ∈ O}) ≥ P ({Y ∈ O}) for any open set O ⊂
Rk .
236 The Box–Jenkins Program: A Case Study
Proof. (i) ⇒ (ii): Suppose that F (x), Fn (x) denote the correspond-
ing distribution functions of real valued random k-vectors Y , Yn for
all n ∈ N, respectively, satisfying Fn (x) → F (x) as n → ∞ for ev-
ery continuity point x of F (x). Let ϑ : Rk → R be a bounded and
continuous function, which is obviously bounded by the finite value
B := supx {|ϑ(x)|}. Now, given ε > 0, we find, due to the right-
continuousness, continuity points ±C := ±(C1 , . . . , Ck )T of F (x),
with Cr 6= 0 for r = 1, . . . , k and a compact set K := {(x1 , . . . , xk ) :
−Cr ≤ xr ≤ Cr , r = 1, . . . , k} such that P {Y ∈ / K} < ε/B. Note
that this entails P {Yn ∈ / K} < 2ε/B for n sufficiently large. Now, we
choose l ≥ 2 continuity points xj := (xj1 , . . . , xjk )T , j ∈ {1, . . . , l},
of F (x) such that −Cr = x1r < · · · < xlr = Cr for each r ∈
{1, . . . , k} and such that supx∈K |ϑ(x) − ϕ(x)| < ε, where ϕ(x) :=
Pl−1
i=1 ϑ(xi )1(xi ,xi+1 ] (x). Then, we attain, for n sufficiently large,
as n → ∞.
(ii) ⇒ (iv): Let C ⊂ Rk be a closed set. We define ψC (y) := inf{||y −
x|| : x ∈ C}, ψ : Rk → R, which is a continuous function, as well as
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 237
|φXn (t) − φY (t)| ≤ |φXn (t) − φYmn (t)| + |φYmn (t) − φYm (t)|
+ |φYm (t) − φY (t)|, (7.12)
where the first term on the right-hand side satisfies, for δ > 0,
which implies
Moreover, we find
|φXn (t) − φY (t)| ≤ |φXn (t) − φYn (t)| + |φYn (t) − φY (t)|
→ 0 as n → ∞.
Due to the weak law of large numbers, which provides the conver-
1
Pn P P
gence
P in probability n t=1 Zt−u → µ as n → ∞,P we find Ynm →
µ |u|≤m bu as n → ∞. Defining now Ym := µ |u|≤m bu , entailing
Ym → µ ∞
P
u=−∞ bu as m → ∞, it remains to show that
2
P∞
The first term converges
P∞ in probability to
P∞σ ( Pb∞
u=−∞ u bu+k ) = γ(k)
by Lemma 7.2.12 and u=−∞ |bu bu+k | ≤ u=−∞ |bu | w=−∞ |bw+k | <
∞. It remains to show that
n ∞
1X X X P
Wn := bu bw Zt−u Zt−w+k → 0
n t=1 u=−∞
w6=u−k
as n → ∞. We approximate Wn by
n
1X X X
Wnm := bu bw Zt−u Zt−w+k .
n t=1
|u|≤m |w|≤m,w6=u+k
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 245
P
for every ε > 0 in order to establish Wn → 0 as n → ∞. Applying
Markov’s inequality, we attain
This shows
since bu → 0 as u → ∞.
Definition 7.2.14. A sequence (Yt )t∈Z of square integrable, real val-
ued random variables is said to be strictly stationary if (Y1 , . . . , Yk )T
and (Y1+h , . . . , Yk+h )T have the same joint distribution for all k > 0
and h ∈ Z.
Observe that strict stationarity implies (weak) stationarity.
Definition 7.2.15. A strictly stationary sequence (Yt )t∈Z of square
integrable, real valued random variables is said to be m-dependent,
m ≥ 0, if the two sets {Yj |j ≤ k} and {Yi |i ≥ k + m + 1} are
independent for each k ∈ Z.
In the special case of m = 0, m-dependence reduces to independence.
Considering especially a MA(q)-process, we find m = q.
246 The Box–Jenkins Program: A Case Study
Proof. We have
n n n n
√ 1 X 1 X 1 XX
Var( nȲn ) = n E Yi Yj = γ(i − j)
n i=1
n j=1
n i=1 j=1
1
= 2γ(n − 1) + 4γ(n − 2) + · · · + 2(n − 1)γ(1) + nγ(0)
n
n−1 1
= γ(0) + 2 γ(1) + · · · + 2 γ(n − 1)
n n
X |k|
= 1− γ(k).
n
|k|<n,k∈Z
(p)
To prove (ii) we define random variables Ỹi := (Yi+1 +· · ·+Yi+p ), i ∈
N0 := N ∪ {0}, as the sum of p, p > m, sequent variables taken from
(p)
the sequence (Yt )t∈Z . Each Ỹi has expectation zero and variance
p X
p
(p)
X
Var(Ỹi ) = γ(l − j)
l=1 j=1
= pγ(0) + 2(p − 1)γ(1) + · · · + 2(p − m)γ(m).
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 247
√
with variance Var( nȲn − √1n Yrp ) = n1 ((r − 1) Var(Y1 + · · · + Ym ) +
Var(Y1 + · · · + Ym+n−r(p+m) )). From Chebychev’s inequality, we know
√ √
1 1
P nȲn − √ Yrp ≥ ε ≤ ε−2 Var
nȲn − √ Yrp .
n n
248 The Box–Jenkins Program: A Case Study
√
1
lim lim sup P nȲn − √ Yrp ≥ ε
p→∞ n→∞ n
1
≤ lim Var(Y1 + · · · + Ym ) = 0.
p→∞ (p + m)ε2
√ D
Hence, nȲn → N (0, Vm ) as n → ∞.
7.2.17. Let (Yt )t∈Z be the MA(q)-process Yt = qu=0 bu εt−u
P
Example P
q
satisfying u=0 bu 6= 0 and b0 = 1, where the εt are independent,
identically distributed and square integrable random variables with
E(εt ) = 0 and Var(εt ) = σ 2 > 0. Since the process is a q-dependent
strictly stationary sequence with
q q q
!
X X X
2
γ(k) = E bu εt−u bv εt+k−v =σ bu bu+k
u=0 v=0 u=0
σ 2 ( qj=0 bj )2 > 0.
P
2 P
Theorem 7.2.16 (ii) then implies that Ȳn ≈ N (0, σn ( qj=0 bj )2 ) for
sufficiently large n.
P∞
Theorem 7.2.18. Let Yt = u=−∞ bu Zt−u , t ∈ Z, be a station-
ary process with absolutely summable real valued filter (bu )u∈Z , where
(Zt )t∈Z is a process of independent, identically distributed, square inte-
grable and P real valued random variables with E(Zt ) = 0 and Var(Zt ) =
σ > 0. If ∞
2
u=−∞ bu 6= 0, then, for n ∈ N,
∞
√ D
2
X 2
nȲn → N 0, σ bu as n → ∞, as well as
u=−∞
∞
σ2 X 2
Ȳn ≈ N 0, bu for n sufficiently large,
n u=−∞
1
Pn
where Ȳn := n i=1 Yi .
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 249
(m) (m)
Proof. We approximate Yt by Yt := m
P
u=−m bu Zt−u and let Ȳn :=
1
Pn (m)
n t=1 Yt . With Theorem 7.2.16 and Example 7.2.17 above, we
attain, as n → ∞,
√ (m) D (m)
nȲn → Y ,
D
In order to show Ȳn → Y as n → ∞ by Theorem 7.2.7 and Cheby-
chev’s inequality, we have to proof
√
lim lim sup Var( n(Ȳn − Ȳn(m) )) = 0.
m→∞ n→∞
as n → ∞.
250 The Box–Jenkins Program: A Case Study
Yn = Xn ap + En (7.16)
The considered process (Yt )t∈Z satisfies the stationarity condition. So,
Theorem
P 2.2.3 provides the almost surely stationary solution Yt =
u≥0 bu εt−u , t ∈ Z. Consequently, we are able to approximate Yt
(m) (m)
:= m
P
by Yt u=0 bu εt−u and furthermore Wt by the term Wt :=
(m) (m) T T
(Yt−1 εt , . . . , Yt−p εt ) . Taking an arbitrary vector λ := (λ1 , . . . , λp ) ∈
Rp , λ 6= 0, we gain a strictly stationary (m + p)-dependent sequence
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 253
(m)
(Rt )t∈Z defined by
(m) (m) (m) (m)
Rt := λT Wt = λ1 Yt−1 εt + · · · + λp Yt−p εt
Xm X m
= λ1 bu εt−u−1 εt + λ2 bu εt−u−2 εt + . . .
u=0 u=0
X m
+ λp bu εt−u−p εt
u=0
(m)
with expectation E(Rt ) = 0 and variance
(m) (m)
Var(Rt ) = λT Var(Wt )λ = σ 2 λT Σ(m)
p λ > 0,
(m)
as n → ∞, where U (m) is N (0, σ 2 Σp )-distributed. Since, entrywise,
(m) D
σ 2 Σp → σ 2 Σp as m → ∞, we attain λT U (m) → λT U as m → ∞
by the dominated convergence theorem, where U follows the normal
distribution N (0, σ 2 Σp ). With Theorem 7.2.7, it only remains to show
that, for every ε > 0,
n 1 X n n
T 1 X T (m) o
lim lim sup P √ λ Wt − √ λ Wt > ε = 0
m→∞ n→∞ n t=1 n t=1
254 The Box–Jenkins Program: A Case Study
to establish
n
1 X T D
√ λ Wt → λT U
n t=1
(m)
being independent of n. Since almost surely Wt → Wt as m → ∞,
Chebychev’s inequality finally gives
n 1 n o
(m)
X
T
lim lim sup P √ λ (Wt − Wt ) ≥ ε
m→∞ n→∞ n t=1
1 (m)
≤ lim 2 λT Var(Wt − Wt )λ = 0.
m→∞ ε
n−k n−k
1 X X
√ (Yt+k − Ȳn )(Yt − Ȳn ) − Ys Ys+k
n t=1
s=1−k
0 n−k
1 X 1 X n−k
= −√ Yt Yt+k − √ Ȳn Ys+k + Ys + √ Ȳn2 , (7.19)
n n s=1
n
t=1−k
Focusing now on the second term in (7.18), Lemma 7.2.13 implies that
1 T P
Xn Yn → γ(p)
n
as n → ∞. Hence, we need to show the convergence in probability
√ −1 P
n ||Ĉp,n − n(XnT Xn )−1 || → 0 as n → ∞,
−1
where ||Ĉp,n − n(XnT Xn )−1 || denotes the Euclidean norm of the p2
−1
dimensional vector consisting of all entries of the p × p-matrix Ĉp,n −
n(XnT Xn )−1 . Thus, we attain
√ −1
n ||Ĉp,n − n(XnT Xn )−1 || (7.21)
√ −1 −1
= n ||Ĉp,n (n XnT Xn − Ĉp,n )n(XnT Xn )−1 ||
√ −1
≤ n ||Ĉp,n || ||n−1 XnT Xn − Ĉp,n || ||n(XnT Xn )−1 ||, (7.22)
where
√
n ||n−1 XnT Xn − Ĉp,n ||2
p X
X p X n
−1/2 −1/2
=n n Ys−i Ys−j
i=1 j=1 s=1
n−|i−j|
X 2
−1/2
−n (Yt+|i−j| − Ȳn )(Yt − Ȳn ) .
t=1
or, equivalently,
n−i n−k
P
X X
−1/2 −1/2
n Ys Ys−j+i − n Yt Yt+k → 0 as n → ∞.
s=1−i t=1−k
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 257
entails that
n 0 n o
−1/2 X P
X
−1/2
P n Ys Ys−j+i − n Yt+i−j Yt ≥ ε → 0 as n → ∞.
s=1−i t=n−i+1
−1 P P
This completes the proof, since Ĉp,n → Σ−1 T
p as well as n(Xn Xn )
−1
→
−1
Σp as n → ∞ by Lemma 7.2.13.
258 The Box–Jenkins Program: A Case Study
−1 1
σkk = Vyy,x = ,
Var(Yk − Ŷk,k−1 )
Yn = OP (hn )
Yn = oP (hn )
Yn = O P (hn )
Yn = oP (hn )
As Yni = OP (1), for arbitrary η > 0, there exists κ > 0 such that
P {|Yni | > κ} < η for all n ∈ N. Choosing δ := κ, we finally attain
Lemma 7.3.4. Consider two sequences (Yn )n∈N and (Xn )n∈N of real
valued random k-vectors, all defined on the same probability space
P
(Ω, A, P), such that Yn − Xn = oP (1). If furthermore Yn → Y as
P
n → ∞, then Xn → Y as n → ∞.
which implies
k
X ∂ϑ
ϑ(Yn ) = ϑ(c) + (c)(Yni − ci ) + oP (hn ).
i=1
∂y i
Proof. The Taylor series expansion (e.g. Seeley, 1970, Section 5.3)
gives, as y → c,
k
X ∂ϑ
ϑ(y) = ϑ(c) + (c)(yi − ci ) + o(|y − c|),
i=1
∂y i
The Delta-Method
Theorem 7.3.7. (The Multivariate Delta-Method) Consider a se-
quence of real valued random k-vectors (Yn )n∈N , all defined on the
D
same probability space (Ω, A, P), such that h−1 n (Yn − µ) → N (0, Σ)
as n → ∞ with µ := (µ, . . . , µ)T ∈ Rk , hn → 0 as n → ∞ and
Σ := (σrs )1≤r,s≤k being a symmetric and positive definite k ×k-matrix.
Moreover, let ϑ = (ϑ1 , . . . , ϑm )T : y → ϑ(y) be a function from Rk
into Rm , m ≤ k, where each ϑj , 1 ≤ j ≤ m, is continuously differen-
tiable in a neighborhood of µ. If
∂ϑ
j
∆ := (y)
∂yi µ 1≤j≤m,1≤i≤k
7.3 Asymptotic Normality of Autocorrelation Estimator 263
as n → ∞.
or, respectively,
h−1 −1
n (ϑ(Yn ) − ϑ(µ)) = hn ∆(Yn − µ) + oP (1).
D
We know h−1 T
n ∆(Yn − µ) → N (0, ∆Σ∆ ) as n → ∞, thus, we con-
D
clude from Lemma 7.2.8 that h−1 T
n (ϑ(Yn ) − ϑ(µ)) → N (0, ∆Σ∆ ) as
n → ∞ as well.
Observe that
4
ασ
if g = h = i = j,
E(εg εh εi εj ) = σ 4 if g = h 6= i = j, g = i 6= h = j, g = j 6= h = i,
0 elsewhere.
We therefore find
E(Yt Yt+s Yt+s+r Yt+s+r+v )
X ∞ X∞ X∞ X∞
= bg bh+s bi+s+r bj+s+r+v E(εt−g εt−h εt−i εt−j )
g=−∞ h=−∞ i=−∞ j=−∞
X∞ X∞
4
= σ (bg bg+s bi+s+r bi+s+r+v + bg bg+s+r bi+s bi+s+r+v
g=−∞ i=−∞
+ bg bg+s+r+v bi+s bi+s+r )
X∞
4
+ (α − 3)σ bj bj+s bj+s+r bj+s+r+v
j=−∞
∞
X
= (α − 3)σ 4 bg bg+s bg+s+r bg+s+r+v
g=−∞
+ γ(s)γ(v) + γ(r + s)γ(r + v) + γ(r + s + v)γ(r).
7.3 Asymptotic Normality of Autocorrelation Estimator 265
Applying the result to the covariance of γ̃n (k) and γ̃n (l) provides
Defining
The absolutely summable filter (bu )u∈Z entails the absolute summa-
bility of the sequence (Cm )m∈Z . Hence, by the dominated convergence
266 The Box–Jenkins Program: A Case Study
Lemma 7.3.9. Consider the stationary process (Yt )t∈Z from the previ-
ous Lemma 7.3.8 satisfying E(ε4t ) := ασ 4 < ∞, α > 0, b0 = 1 and bu =
0 for u < 0. Let γ̃p,n := (γ̃n (0), . . . , γ̃n (p))T , γp := (γ(0), . . . , γ(p))T
for p ≥ 0, n ∈ N, and let the p × p-matrix Σc := (ckl )0≤k,l≤p be given
by
ckl := (α − 3)γ(k)γ(l)
X∞
+ γ(m)γ(m − k + l) + γ(m + l)γ(m − k) .
m=−∞
(q) Pq
Furthermore, consider the MA(q)-process Yt := u=0 bu εt−u , q ∈
N, t ∈ Z, with corresponding autocovariance function γ(k)(q) and the
(q) (q)
p × p-matrix Σc := (ckl )0≤k,l≤p with elements
(q)
ckl :=(α − 3)γ(k)(q) γ(l)(q)
∞
X
(q) (q) (q) (q)
+ γ(m) γ(m − k + l) + γ(m + l) γ(m − k) .
m=−∞
(q)
Then, if Σc and Σc are regular,
√ D
n(γ̃p,n − γp ) → N (0, Σc ),
as n → ∞.
7.3 Asymptotic Normality of Autocorrelation Estimator 267
(q)
Proof. Consider the MA(q)-process Yt := qu=0 bu εt−u , q ∈ N, t ∈
P
Z, with corresponding autocovariance function γ(k)(q) . We define
(q) (q) (q)
γ̃n (k)(q) := n1 nt=1 Yt Yt+k as well as γ̃p,n := (γ̃n (0)(q) , . . . , γ̃n (p)(q) )T .
P
Defining moreover
(q) (q) (q) (q) (q) (q) (q)
Yt := (Yt Yt , Yt Yt+1 , . . . , Yt Yt+p )T , t ∈ Z,
(q)
we attain a strictly stationary (q+p)-dependence sequence (λT Yt )t∈Z
for any λ := (λ1 , . . . , λp+1 )T ∈ Rp+1 , λ 6= 0. Since
n
1 X (q) (q)
Yt = γ̃p,n ,
n t=1
(q)
as n → ∞, where γp := (γ(0)(q) , . . . , γ(p)(q) )T . Since, entrywise,
(q)
Σc → Σc as q → ∞, the dominated convergence theorem gives
n
D
X
−1/2
n λT Yt − n1/2 λT γp → N (0, λT Σc λ)
t=1
√ P P∞
We know from Theorem 7.2.18 that either nȲ → 0, if u=0 bu =0
or
∞
√ D
2
X 2
nȲ → N 0, σ bu
u=0
P∞
as
√ n → ∞, if u=0 bu 6= 0. This entails the boundedness in probability
nȲ = OP (1) by Lemma 7.3.3. By Markov’s inequality, it follows,
for every ε > 0,
n 1 n
X o 1 1 n
X
P √ Yt+k Yt ≥ ε ≤ E √ Yt+k Yt
n ε n
t=n−k+1 t=n−k+1
k
≤ √ γ(0) → 0 as n → ∞,
ε n
Pn P
which shows that n−1/2 t=n−k+1 Yt+k Yt → 0 as n → ∞. Applying
Lemma 7.2.12 leads to
n−k
n−k 1X P
Ȳ − (Yt+k + Yt ) → 0 as n → ∞.
n n t=1
Proof. Note that r̂n (k) is well defined for sufficiently large n since
P
ĉn (0) → γ(0) = σ 2 ∞ 2
P
u=−∞ bu > 0 as n → ∞ by (7.14). Let ϑ be
x
the function defined by ϑ((x0 , x1 , . . . , xp )T ) = ( xx01 , xx02 , . . . , xp0 )T , where
xs ∈ R for s ∈ {0, . . . , p} and x0 6= 0. The multivariate delta-method
7.3.7 and Lemma 7.3.10 show that
ĉn (0) γ(0)
√ √
n ϑ ... − ϑ ... = n(r̂p,n − ρp ) → N 0, ∆Σc ∆T
D
p
X p
X
rij = δik ckl δjl
k=0 l=0
p
1 X
= − ρ(i) (c0l δjl + cil δjl )
γ(0)2
l=0
1
= ρ(i)ρ(j)c00 − ρ(i)c0j − ρ(j)c10 + c11
γ(0)2
X ∞
= ρ(i)ρ(j) (α − 3) + 2ρ(m)2
m=−∞
∞
X
0 0
− ρ(i) (α − 3)ρ(j) + 2ρ(m )ρ(m + j)
m0 =−∞
X ∞
∗ ∗
− ρ(j) (α − 3)ρ(i) + 2ρ(m )ρ(m − i) + (α − 3)ρ(i)ρ(j)
m∗ =−∞
∞
X
+ ρ(m°)ρ(m° − i + j) + ρ(m° + j)ρ(m° − i)
m°=−∞
∞
X
= 2ρ(i)ρ(j)ρ(m)2 − 2ρ(i)ρ(m)ρ(m + j)
m=−∞
− 2ρ(j)ρ(m)ρ(m − i) + ρ(m)ρ(m − i + j) + ρ(m + j)ρ(m − i) .
(7.26)
We may write
∞
X ∞
X
ρ(j)ρ(m)ρ(m − i) = ρ(j)ρ(m + i)ρ(m)
m=−∞ m=−∞
as well as
∞
X ∞
X
ρ(m)ρ(m − i + j) = ρ(m + i)ρ(m + j). (7.27)
m=−∞ m=−∞
using (7.27).
12 /* Compute mean */
13 PROC MEANS DATA=donau;
14 VAR discharge;
15 RUN;
16
17 /* Graphical options */
18 SYMBOL1 V=DOT I=JOIN C=GREEN H=0.3 W=1;
19 AXIS1 LABEL=(ANGLE=90 ’Discharge’);
20 AXIS2 LABEL=(’January 1985 to December 2004’) ORDER=(’01JAN85’d ’01
,→JAN89’d ’01JAN93’d ’01JAN97’d ’01JAN01’d ’01JAN05’d);
21 AXIS3 LABEL=(ANGLE=90 ’Autocorrelation’);
22 AXIS4 LABEL=(’Lag’) ORDER = (0 1000 2000 3000 4000 5000 6000 7000);
23 AXIS5 LABEL=(’I(’ F=CGREEK ’l)’);
24 AXIS6 LABEL=(F=CGREEK ’l’);
25
26 /* Generate data plot */
27 PROC GPLOT DATA=donau;
28 PLOT discharge*date=1 / VREF=201.6 VAXIS=AXIS1 HAXIS=AXIS2;
29 RUN;
30
31 /* Compute and plot empirical autocorrelation */
32 PROC ARIMA DATA=donau;
33 IDENTIFY VAR=discharge NLAG=7000 OUTCOV=autocorr NOPRINT;
34 PROC GPLOT DATA=autocorr;
35 PLOT corr*lag=1 /VAXIS=AXIS3 HAXIS=AXIS4 VREF=0;
36 RUN;
37
38 /* Compute periodogram */
39 PROC SPECTRA DATA=donau COEF P OUT=data1;
40 VAR discharge;
41
42 /* Adjusting different periodogram definitions */
43 DATA data2;
44 SET data1(FIRSTOBS=2);
45 p=P_01/2;
46 lambda=FREQ/(2*CoNSTANT(’PI’));
276 The Box–Jenkins Program: A Case Study
38 RUN;
39
40 /* Compute and plot autocorrelation of adjusted data */
41 PROC ARIMA DATA=seasad;
42 IDENTIFY VAR=sa NLAG=1000 OUTCOV=corrseasad NOPRINT;
43
44 /* Add confidence intervals */
45 DATA corrseasad;
46 SET corrseasad;
47 u99=0.079;
48 l99=-0.079;
49
50 PROC GPLOT DATA=corrseasad;
51 PLOT corr*lag=1 u99*lag=2 l99*lag=2 / OVERLAY VAXIS=AXIS5 HAXIS=
,→AXIS6 VREF=0 LEGEND=LEGEND1;
52 RUN;
53
54 /* Compute periodogram of adjusted data */
55 PROC SPECTRA DATA=seasad COEF P OUT=data1;
56 VAR sa;
57
58 /* Adjust different periodogram definitions */
59 DATA data2;
60 SET data1(FIRSTOBS=2);
61 p=P_01/2;
62 lambda=FREQ/(2*CoNSTANT(’PI’));
63 DROP P_01 FREQ;
64
65 /* Plot periodogram of adjusted data */
66 PROC GPLOT DATA=data2(OBS=100);
67 PLOT p*lambda=1 / VAXIS=AXIS3 HAXIS=AXIS4;
68 RUN;
69
70 /* Test for stationarity */
71 PROC ARIMA DATA=seasad;
72 IDENTIFY VAR=sa NLAG=100 STATIONARITY=(ADF=(7,8,9));
73 RUN; QUIT;
The seasonal influences of the periods of riodogram are created from the adjusted data.
both 365 and 2433 days are removed by The confidence interval I := [−0.079, 0.079]
the SEASONALITY option in the procedure pertaining to the 3σ-rule is displayed by the
PROC TIMESERIES. By MODE=ADD, an addi- variables l99 and u99.
tive model for the time series is assumed. The The final part of the program deals with the test
mean correction of -201.6 finally completes the for stationarity. The augmented Dickey-Fuller
adjustment. The dates of the original Donau- test is initiated by ADF in the STATIONARITY
woerth dataset need to be restored by means option in PROC ARIMA. Since we deal with true
of MERGE within the DATA statement. Then, the ARMA(p, q)-processes, we have to choose a
same steps as in the previous Program 7.4.1 high-ordered autoregressive model, whose or-
(donauwoerth firstanalysis.sas) are executed, der selection is specified by the subsequent
i.e., the general plot as well as the plots of numbers 7, 8 and 9.
the empirical autocorrelation function and pe-
7.4 First Examinations 281
2 1 2 2 2
σq := 1 + 2r(1) + 2r(2) · · · + 2r(q) . (7.30)
n
Since r̂n (i) is asymptotically normal distributed (see Section 7.3), we
will reject the hypothesis H0 that actually a MA(q)-process is underly-
ing the time series ỹ1 , . . . , ỹn , if significantly more than 1 − α percent
of the empirical autocorrelation coefficients with lag greater than q
lie outside of the confidence interval I := [−σq · q1−α/2 , σq · q1−α/2 ]
of level α, where q1−α/2 denotes the (1 − α/2)-quantile of the stan-
dard p normal distribution. Setting q = 10, we attain in our case
σq ≈ 5/7300 ≈ 0.026. By the 3σ rule, we would expect that almost
all empirical autocorrelation coefficients with lag greater than 10 will
be elements of the confidence interval J := [−0.079, 0.079]. Hence,
Plot 7.4.2b makes us doubt that we can express the time series by a
mere MA(q)-process with a small q ≤ 10.
The following figure of the empirical partial autocorrelation function
can mislead to the deceptive conclusion that an AR(3)-process might
282 The Box–Jenkins Program: A Case Study
partial partial
lag lag
autocorrelation autocorrelation
1 0.92142 44 0.03368
2 -0.33057 45 -0.02412
3 0.17989 47 -0.02892
4 0.02893 61 -0.03667
5 0.03871 79 0.03072
6 0.05010 81 -0.03248
7 0.03633 82 0.02521
15 0.03937 98 -0.02534
1 /* donauwoerth_pacf.sas */
2 TITLE1 ’Partial Autocorrelation’;
3 TITLE2 ’Donauwoerth Data’;
4
5 /* Note that this program requires the file ’seasad’ generated by the
,→previous program (donauwoerth_adjustment.sas) */
6
7 /* Graphical options */
8 SYMBOL1 V=DOT I=JOIN C=GREEN H=0.5;
9 SYMBOL2 V=NONE I=JOIN C=BLACK L=2;
10 AXIS1 LABEL=(ANGLE=90 ’Partial Autocorrelation’) ORDER=(-0.4 TO 1 BY
,→0.1);
11 AXIS2 LABEL=(’Lag’);
12 LEGEND2 LABEL=NONE VALUE=(’Partial autocorrelation of adjusted series’
13 ’Lower 95-percent confidence limit’ ’Upper 95-percent confidence
,→limit’);
14
15 /* Compute partial autocorrelation of the seasonal adjusted data */
16 PROC ARIMA DATA=seasad;
17 IDENTIFY VAR=sa NLAG=100 OUTCOV=partcorr NOPRINT;
18 RUN;
19
20 /* Add confidence intervals */
21 DATA partcorr;
22 SET partcorr;
23 u95=0.0234;
24 l95=-0.0234;
25
284 The Box–Jenkins Program: A Case Study
coefficients.
To specify the order of an invertible ARMA(p, q)-model
Information Criterions
To choose the orders p and q of an ARMA(p, q)-process one commonly
takes the pair (p, q) minimizing some information function, which is
based on the loglikelihood function.
For Gaussian ARMA(p, q)-processes we have derived the loglikelihood
function in Section 2.3
l(ϑ|y1 , . . . , yn )
n 1 1
= log(2πσ 2 ) − log(det Σ0 ) − 2 Q(ϑ|y1 , . . . , yn ). (7.32)
2 2 2σ
The maximum likelihood estimator ϑ̂ := (σ̂ 2 , µ̂, â1 , . . . , âp , b̂1 , . . . , b̂q ),
maximizing l(ϑ|y1 , . . . , yn ), can often be computed by deriving the
ordinary derivative and the partial derivatives of l(ϑ|y1 , . . . , yn ) and
equating them to zero. These are the so-called maximum likelihood
equations, which ϑ̂ necessarily has to satisfy. Holding σ 2 , a1 , . . . , ap , b1 ,
286 The Box–Jenkins Program: A Case Study
. . . , bq , fix we obtain
∂l(ϑ|y1 , . . . , yn ) 1 ∂Q(ϑ|y1 , . . . , yn )
=− 2
∂µ 2σ ∂µ
1 ∂
T 0 −1
=− 2 (y − µ) Σ (y − µ)
2σ ∂µ
1 ∂ T 0 −1 −1
=− 2 y Σ y + µT Σ 0 µ
2σ ∂µ
T 0 −1 T 0 −1
−µ·y Σ 1−µ·1 Σ y
1 −1 −1
=− 2
(2µ · 1T Σ0 1 − 2 · 1T Σ0 y),
2σ
where 1 := (1, 1, . . . , 1)T , y := (y1 , . . . , yn )T ∈ Rn . Equating the par-
tial derivative to zero yields finally the maximum likelihood estimator
µ̂ of µ
−1
1T Σ̂0 y
µ̂ = −1 , (7.33)
1T Σ̂0 1
where Σ̂0 equals Σ0 , where the unknown parameters a1 , . . . , ap and
b1 , . . . , bq are replaced by maximum likelihood estimators â1 , . . . , âp
and b̂1 , . . . , b̂q .
The maximum likelihood estimator σ̂ 2 of σ 2 can be achieved in an
analogous way with µ, a1 , . . . , ap , b1 , . . . , bq being held fix. From
∂l(ϑ|y1 , . . . , yn ) n Q(ϑ|y1 , . . . , yn )
2
=− 2+ ,
∂σ 2σ 2σ 4
we attain
Q(ϑ̂|y1 , . . . , yn )
σ̂ 2 = . (7.34)
n
The computation of the remaining maximum likelihood estimators
â1 , . . . , âp , b̂1 , . . . , b̂q is usually a computer intensive problem.
Since the maximized loglikelihood function only depends on the model
assumption Mp,q , lp,q (y1 , . . . , yn ) := l(ϑ̂|y1 , . . . , yn ) is suitable to cre-
ate a measure for comparing the goodness of fit of different models.
The greater lp,q (y1 , . . . , yn ), the higher the likelihood that y1 , . . . , yn
actually results from a Gaussian ARMA(p, q)-process.
7.5 Order Selection 287
l(ϑ̂|y1 , . . . , yn )
n 1 1
= − log(2πσ̂ 2 ) − log(det Σ̂0 ) − 2 Q(ϑ̂|y1 , . . . , yn )
2 2 2σ̂
as well as
l(ϑ̂|z1 , . . . , zn )
n 1 1
= − log(2πσ̂ 2 ) − log(det Σ̂0 ) − 2 Q(ϑ̂|z1 , . . . , zn ).
2 2 2σ̂
This leads to the representation
l(ϑ̂|z1 , . . . , zn )
1
= l(ϑ̂|y1 , . . . , yn ) − Q(ϑ̂|z1 , . . . , z n ) − Q(ϑ̂|y 1 , . . . , y n ) . (7.35)
2σ̂ 2
The asymptotic distribution of Q(ϑ̂|Z1 , . . . , Zn ) − Q(ϑ̂|Y1 , . . . , Yn ),
where Z1 , . . . , Zn and Y1 , . . . , Yn are independent samples resulting
from the ARMA(p, q)-process, can be computed for n sufficiently
large, which reveals an asymptotic expectation of 2σ 2 (p + q) (see for
example Brockwell and Davis, 1991, Section 8.11). If we now replace
the term Q(ϑ̂|z1 , . . . , zn ) − Q(ϑ̂|y1 , . . . , yn ) in (7.35) by its approxi-
mately expected value 2σ̂ 2 (p + q), we attain with (7.34)
l(ϑ̂|z1 , . . . , zn ) = l(ϑ̂|y1 , . . . , yn ) − (p + q)
n 1 1
= − log(2πσ̂ 2 ) − log(det Σ̂0 ) − 2 Q(ϑ̂|y1 , . . . , yn ) − (p + q)
2 2 2σ̂
n n 1 n
= − log(2π) − log(σ̂ 2 ) − log(det Σ̂0 ) − − (p + q).
2 2 2 2
288 The Box–Jenkins Program: A Case Study
The greater the value of l(ϑ̂|z1 , . . . , zn ), the more precisely the esti-
mated parameters σ̂ 2 , µ̂, â1 , . . . , âp , b̂1 , . . . , b̂q reproduce the true under-
lying ARMA(p, q)-process. Due to this result, Akaike’s Information
Criterion (AIC) is defined as
2 1
AIC := − l(ϑ̂|z1 , . . . , zn ) − log(2π) − log(det Σ̂0 ) − 1
n n
2(p + q)
= log(σ̂ 2 ) +
n
for sufficiently large n, since then n−1 log(det Σ̂0 ) becomes negligible.
Thus, the smaller the value of the AIC, the greater the loglikelihood
function l(ϑ̂|z1 , . . . , zn ) and, accordingly, the more precisely ϑ̂ esti-
mates ϑ.
The AIC adheres to the principle of parsimony as the model orders
are included in an additional term, the so-called penalty function.
However, comprehensive studies came to the conclusion that the AIC
has the tendency to overestimate the order p. Therefore, modified cri-
terions based on the AIC approach were suggested, like the Bayesian
Information Criterion (BIC)
2 (p + q) log(n)
BIC(p, q) := log(σ̂p,q )+
n
and the Hannan-Quinn Criterion
Remark 7.5.1. Although these criterions are derived under the as-
sumption of an underlying Gaussian process, they nevertheless can
tentatively identify the orders p and q, even if the original process is
not Gaussian. This is due to the fact that the criterions can be re-
garded as a measure of the goodness of fit of the estimated covariance
matrix Σ̂0 to the time series (Brockwell and Davis, 2002).
7.5 Order Selection 289
MINIC Method
The MINIC method is based on the previously introduced AIC and
BIC. The approach seeks the order pair (p, q) minimizing the BIC-
Criterion
2 (p + q) log(n)
BIC(p, q) = log(σ̂p,q )+
n
in a chosen order range of pmin ≤ p ≤ pmax and qmin ≤ q ≤ qmax ,
2
where σ̂p,q is an estimate of the variance Var(εt ) = σ 2 of the errors
εt in the model (7.31). In order to estimate σ 2 , we first have to esti-
mate the model parameters from the zero-mean time series ỹ1 , . . . , ỹn .
Since their computation by means of the maximum likelihood method
is generally very extensive, the parameters are estimated from the re-
gression model
S 2 (a1 , . . . , ap , b1 , . . . , bq )
X n
= (ỹt − a1 ỹt−1 − · · · − ap ỹt−p − b1 ε̂t−1 − · · · − bq ε̂t−q )2
t=max{p,q}+1
Xn
= ε̂2t
t=max{p,q}+1
where
ε̃ˆt := ỹt − â1 ỹt−1 − · · · − âp ỹt−p − b̂1 ε̂t−1 − · · · − b̂q ε̂t−q ,
290 The Box–Jenkins Program: A Case Study
ESACF Method
The second method provided by SAS is the ESACF method , devel-
oped by Tsay and Tiao (1984). The essential idea of this approach
is based on the estimation of the autoregressive part of the underly-
ing process in order to receive a pure moving average representation,
whose empirical autocorrelation function, the so-called extended sam-
ple autocorrelation function, will be close to 0 after the lag of order.
For this purpose, we utilize a recursion initiated by the best linear
(k) (k)
approximation with real valued coefficients Ŷt := λ0,1 Yt−1 + · · · +
(k)
λ0,k Yt−k of Yt in the model (7.31) for a chosen k ∈ N. Recall that the
coefficients satisfy
(k)
. . . ρ(k − 1) λ0,1
ρ(1) ρ(0)
.
.. = .
.. ... .
.. ...
(7.38)
ρ(k) ρ(−k + 1) . . . ρ(0) (k)
λ0,k
and are uniquely determined by Theorem 7.1.1. The lagged residual
(k) (k) (k) (k)
Rt−1,0 , Rs,0 := Ys − λ0,1 Ys−1 − · · · − λ0,k Ys−k , s ∈ Z, is a linear combi-
7.5 Order Selection 291
(i) (i)
where ωs (i) := ρ(s) − λ0,1 ρ(s + 1) − · · · − λ0,i ρ(s + i), s ∈ Z, i ∈ N.
Note that ωs (k) = 0 for s = −1, · · · − k by (7.38).
The recursion now proceeds with the best linear approximation of Yt
(k) (k)
based on Yt−1 , . . . , Yt−k , Rt−1,1 , Rt−2,0 with
(k) (k) (k) (k) (k)
Rt−1,1 := Yt−1 − λ1,1 Yt−2 − · · · − λ1,k Yt−k−1 − λ1,k+1 Rt−2,0
being the lagged residual of the previously executed regression. After
l > 0 such iterations, we get
(k+l) (k+l)
λ0,1 Yt−1 + · · · + λ0,k+l Yt−k−l
(k) (k)
= λl,1 Yt−1 + · · · + λl,k Yt−k
(k) (k) (k) (k) (k) (k)
+ λl,k+1 Rt−1,l−1 + λl,k+2 Rt−2,l−2 + · · · + λl,k+l Rt−l,0 ,
292 The Box–Jenkins Program: A Case Study
where
0
l
(k) (k) (k) (k) (k)
X
Rt0 ,l0 := Yt0 − λl0 ,1 Yt0 −1 − · · · − λl0 ,k Yt0 −k − λl0 ,k+i Rt0 −i,l0 −i
i=1
P0 (k) (k)
for t0 ∈ Z, l0 ∈ N0 , i=1 λl0 ,k+i Rt0 −i,l0 −i := 0 and where all occurring
(k)
coefficients are real valued. Since, on the other hand, Rt,l = Yt −
(k+l) (k+l)
λ0,1 Yt−1 − · · · − λ0,k+l Yt−k−l , the coefficients satisfy
1 0 ··· 0
(k+l−1) ... ..
Ik −λ0,1 1 .
(k)
λl,1
−λ0,2
(k+l−1)
−λ0,1
(k+l−2) ...
λ(k)
(k+l)
λ0,1
.. .. l,2 ..
. . 0
λ(k)
.
l,3
1 = ,
.. .. .. .
. . (k)
−λ0,1 . ..
(k)
.. .. ... .. λl,k+l−1 (k+l)
. . .
0lk λ0,k+l
(k+l−1) (k+l−2) (k) (k)
−λ0,k+l−2 −λ0,k+l−3 ··· −λ0,k−1 λl,k+l
(k+l−1) (k+l−2) (k)
−λ0,k+l−1 −λ0,k+l−2 ··· −λ0,k
(k) (k)
If we assume that the coefficients λl,1 , . . . , λl,k+l are uniquely deter-
7.5 Order Selection 293
Using partial derivatives and equating them to zero yields the follow-
294 The Box–Jenkins Program: A Case Study
for n sufficiently large and k sufficiently small, where the terms c(i) =
1
Pn−i
n t=1 ỹt+i ỹt , i = 1, . . . , k, denote the empirical autocorrelation coef-
ficients at lag i. We define until the end of this section for a given
real valued series (ci )i∈N
h
X h
Y
ci := 0 if h < j as well as ci := 1 if h < j.
i=j i=j
for t = 1 + k + l, . . . , n, where
0
k l
(k) (k) (k) (k)
X X
rt0 ,l0 := ỹt0 − ỹt0 −i λ̂l0 ,i − λ̂l0 ,k+j rt0 −j,l0 −j (7.43)
i=1 j=1
for t0 ∈ Z, l0 ∈ N0 .
7.5 Order Selection 295
(i) (i)
where ω̃s (i) := c(s)−λ0,1 c(s+1)−· · ·−λ0,i c(s+i), s ∈ Z, i ∈ N. Since
1
Pn−k P
n t=1 Yt−k Yt → γ(k) as n → ∞ by Theorem 7.2.13, we attain, for
(p) (p) (p) (p)
k = p, that (λ̂l,1 , . . . , λ̂l,p )T ≈ (λl,1 , . . . , λl,p )T = (a1 , . . . , ap , )T for n
sufficiently large, if, actually, the considered model (7.31) with p = k
and q ≤ l is underlying the zero-mean time series ỹ1 , . . . , ỹn . Thereby,
(p) (p) (p) (p)
Zt,l :=Yt − λ̂l,1 Yt−1 − λ̂l,2 Yt−2 − · · · − λ̂l,p Yt−p
≈Yt − a1 Yt−1 − a2 Yt−2 − · · · − ap Yt−p
=εt + b1 εt−1 + · · · + bq εt−q
where
α(B) := 1 − a1 B − · · · − ap B p ,
β(B) := 1 + b1 B + · · · + bq B q
Yt = α(B)−1 β(B)[εt ]
ESACF Algorithm
The computation of the relevant autoregressive coefficients in the
ESACF-approach follows a simple algorithm
(k+1)
(k) (k+1) (k) λ̂l−1,k+1
λ̂l,r = λ̂l−1,r − λ̂l−1,r−1 (k) (7.45)
λ̂l−1,k
(·)
for k, l ∈ N not too large and r ∈ {1, . . . , k}, where λ̂·,0 = −1. Observe
that the estimations in the previously discussed iteration become less
reliable, if k or l are too large, due to the finite sample size n. Note
that the formula remains valid for r < 1, if we define furthermore
(·)
λ̂·,i = 0 for negative i. Notice moreover that the algorithm as well as
(k ∗ )
the ESACF approach can only be executed, if λ̂l∗ ,k∗ turns out to be
unequal to zero for l∗ ∈ {0, . . . , l − 1} and k ∗ ∈ {k, . . . , k + l − l∗ − 1},
which, in general, is satisfied. Under latter condition above algorithm
can be shown inductively.
Proof. For k ∈ N not too large and t = k + 2, . . . , n,
(k+1) (k+1) (k+1)
ŷt,0 = λ̂0,1 ỹt−1 + · · · + λ̂0,k+1 ỹt−k−1
as well as
(k) (k) (k)
ŷt,1 = λ̂1,1 ỹt−1 + · · · + λ̂1,k ỹt−k +
(k) (k) (k)
+ λ̂1,k+1 ỹt−1 − λ̂0,1 ỹt−2 − · · · − λ̂0,k ỹt−k−1
with
h s1 +sX
2 +···=s s i
(k) (k) ∗(k)
Y
Φs,l0 := (−λ̂l0 ,k+s1 ) − λ̂l0 −(s−s2 −···−sj ),k+sj ,
sj =0 if sj−1 =0 j=2
where
(
∗(k) 1 if sj = 0,
(−λ̂l0 −(s−s2 −···−sj ),k+sj ) = (k)
−λ̂l0 −(s−s2 −···−sj ),k+sj else.
So, let the assertion (7.47) be shown for one l0 ∈ N0 . The induction
300 The Box–Jenkins Program: A Case Study
we get
(k) (k) (k+1) (k) (k+1)
λ̂0,k−1 Φl0 ,l0 = λ̂0,k − λ̂1,k Φl0 −1,l0 −1 ,
7.5 Order Selection 301
we finally attain
(k) (k) (k+1) (k+1) (k) (k+1)
λ̂m,k Φl−m,l = λ̂m,k+1 Φl−m−1,l−1 + λ̂m,k Φl−m,l−1 ,
⇔
l k
(k) (k) (k) (k) (k)
X X
λ̂l,r − λ̂l−1,r−1 λ̂l,k+1 + λ̂l−s,i Φs,l
s=2 i=max{0,1−s},s+i=r
l−1 k+1
(k+1) (k+1) (k+1)
X X
= λ̂l−1,r + λ̂l−1−v,j Φv,l−1
v=1 j=max{0,1−v},v+j=r
⇔
(k) (k+1) (k) (k) (k) (k+1)
λ̂l,r = λ̂l−1,r + λ̂l−1,r−1 λ̂l,k+1 − λ̂l−1,r−1 λ̂l−1,k+2
⇔
(k+1)
(k) (k+1) (k) λ̂l−1,k+1
λ̂l,r = λ̂l−1,r − λ̂l−1,r−1 (k)
λ̂l−1,k
SCAN Method
The smallest canonical correlation method, short SCAN-method , sug-
gested by Tsay and Tiao (1985), seeks nonzero vectors a and b ∈ Rk+1
304 The Box–Jenkins Program: A Case Study
T T aT Σyt,k ys,k b
Corr(a yt,k , b ys,k ) = T (7.52)
(a Σyt,k yt,k a)1/2 (bT Σys,k ys,k b)1/2
Σ−1 −1
yt,k yt,k Σyt,k ys,k Σys,k ys,k Σys,k yt,k , (7.55)
R, of Σ−1 −1
yt,k yt,k Σyt,k ys,k Σys,k ys,k Σys,k yt,k with corresponding eigenvalue zero,
since Σys,k yt,k µ1 = 0 · µ1 . Accordingly, the two linear combinations
Yt −a1 Yt−1 −· · ·−ap Yt−p and Ys −a1 Ys−1 −· · ·−ap Ys−p are uncorrelated
for t − s > q by (7.53), which isn’t surprising, since they are repre-
sentations of two MA(q)-processes, which are in deed uncorrelated for
t − s > q.
In practice, given a zero-mean time series ỹ1 , . . . , ỹn with sufficiently
large n, the SCAN method computes the smallest eigenvalues λ̂1,t−s,k
of the empirical counterpart of (7.55)
Σ̂−1 −1
yt,k yt,k Σ̂yt,k ys,k Σ̂ys,k ys,k Σ̂ys,k yt,k ,
Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5 MA 6 MA 7 MA 8
Lags MA 1 MA 2 MA 3 MA 4 MA 5 MA 6 MA 7 MA 8
Lags MA 1 MA 2 MA 3 MA 4 MA 5 MA 6 MA 7 MA 8
Lags MA 1 MA 2 MA 3 MA 4 MA 5 MA 6 MA 7 MA 8
Lags MA 1 MA 2 MA 3 MA 4 MA 5 MA 6 MA 7 MA 8
---------SCAN-------- --------ESACF--------
p+d q BIC p+d q BIC
3 2 7.234889 4 5 7.238272
2 3 7.234557 1 6 7.237108
4 1 7.234974 2 6 7.237619
1 6 7.237108 3 6 7.238428
6 6 7.241525
8 6 7.243719
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
1 /* donauwoerth_orderselection.sas */
2 TITLE1 ’Order Selection’;
3 TITLE2 ’Donauwoerth Data’;
4
5 /* Note that this program requires the file ’seasad’ generated by
,→program donauwoerth_adjustment.sas */
6
7 /* Order selection by means of MINIC, SCAN and ESACF approach */
8 PROC ARIMA DATA=seasad;
9 IDENTIFY VAR=sa MINIC PERROR=(8:20) SCAN ESACF p=(1:8) q=(1:8);
10 RUN;
11
12 /* Estimate model coefficients for chosen orders p and q */
13 PROC ARIMA DATA=seasad;
14 IDENTIFY VAR=sa NLAG=100 NOPRINT;
15 ESTIMATE METHOD=CLS p=2 q=3;
16 RUN;
17
18 PROC ARIMA DATA=seasad;
19 IDENTIFY VAR=sa NLAG=100 NOPRINT;
20 ESTIMATE METHOD=CLS p=3 q=2;
21 RUN; QUIT;
The discussed order identification methods are of the autoregressive model is specified that
carried out in the framework of PROC ARIMA, is used for estimating the error sequence in
invoked by the options MINIC, SCAN and the MINIC approach. The second part of the
ESACF in the IDENTIFY statement. p = (1 : 8) program estimates the coefficients of the pre-
and q = (1 : 8) restrict the order ranges, ferred ARMA(2, 3)- and ARMA(3, 2)-models,
admitting integer numbers in the range of 1 using conditional least squares, specified by
to 8 for the autoregressive and moving aver- METHOD=CLS in the ESTIMATE statement.
age order, respectively. By PERROR, the order
p q BIC
2 3 7.234557
3 2 7.234889
4 1 7.234974
3 3 7.235731
2 4 7.235736
5 1 7.235882
4 2 7.235901
Yt − 1.61943Yt−1 + 0.63143Yt−2
= εt − 0.34553εt−1 − 0.34821εt−2 − 0.10327εt−3 (7.57)
3. Overfitting
following plots show these comparisons for the identified ARMA(3, 2)-
and ARMA(2, 3)-model.
1 /* donauwoerth_dcheck1.sas */
2 TITLE1 ’Theoretical process analysis’;
3 TITLE2 ’Donauwoerth Data’;
4 /* Note that this program requires the file ’corrseasad’ and ’seasad’
,→generated by program donauwoerth_adjustment.sas */
5
18 PROC IML;
19 PHI={1 -1.85383 1.04408 -0.17899};
20 THETA={1 -0.58116 -0.23329};
21 LAG=1000;
22 CALL ARMACOV(COV, CROSS, CONVOL, PHI, THETA, LAG);
23 N=1:1000;
24 theocorr=COV/7.7608247812;
25 CREATE autocorr32;
7.6 Diagnostic Check 315
26 APPEND;
27 QUIT;
28
29 %MACRO Simproc(p,q);
30
31 /* Graphical options */
32 SYMBOL1 V=NONE C=BLUE I=JOIN W=2;
33 SYMBOL2 V=DOT C=RED I=NONE H=0.3 W=1;
34 SYMBOL3 V=DOT C=RED I=JOIN L=2 H=1 W=2;
35 AXIS1 LABEL=(ANGLE=90 ’Autocorrelations’);
36 AXIS2 LABEL=(’Lag’) ORDER = (0 TO 500 BY 100);
37 AXIS3 LABEL=(ANGLE=90 ’Partial Autocorrelations’);
38 AXIS4 LABEL=(’Lag’) ORDER = (0 TO 50 BY 10);
39 LEGEND1 LABEL=NONE VALUE=(’theoretical’ ’empirical’);
40
41 /* Comparison of theoretical and empirical autocorrelation */
42 DATA compare&p&q;
43 MERGE autocorr&p&q(KEEP=N theocorr) corrseasad(KEEP=corr);
44 PROC GPLOT DATA=compare&p&q(OBS=500);
45 PLOT theocorr*N=1 corr*N=2 / OVERLAY LEGEND=LEGEND1 VAXIS=AXIS1
,→HAXIS=AXIS2;
46 RUN;
47
48 /* Computing of theoretical partial autocorrelation */
49 PROC TRANSPOSE DATA=autocorr&p&q OUT=transposed&p&q PREFIX=CORR;
50 DATA pacf&p&q(KEEP=PCORR0-PCORR100);
51 PCORR0=1;
52 SET transposed&p&q(WHERE=(_NAME_=’THEOCORR’));
53 ARRAY CORRS(100) CORR2-CORR101;
54 ARRAY PCORRS(100) PCORR1-PCORR100;
55 ARRAY P(100) P1-P100;
56 ARRAY w(100) w1-w100;
57 ARRAY A(100) A1-A100;
58 ARRAY B(100) B1-B100;
59 ARRAY u(100) u1-u100;
60 PCORRS(1)=CORRS(1);
61 P(1)=1-(CORRS(1)**2);
62 DO i=1 TO 100; B(i)=0; u(i)=0; END;
63 DO j=2 TO 100;
64 IF j > 2 THEN DO n=1 TO j-2; A(n)=B(n); END;
65 A(j-1)=PCORRS(j-1);
66 DO k=1 TO j-1; u(j)=u(j)-A(k)*CORRS(j-k); END;
67 w(j)=u(j)+CORRS(j);
68 PCORRS(j)=w(j)/P(j-1);
69 P(j)=P(j-1)*(1-PCORRS(j)**2);
70 DO m=1 TO j-1; B(m)=A(m)-PCORRS(j)*A(j-m); END;
71 END;
72 PROC TRANSPOSE DATA=pacf&p&q OUT=plotdata&p&q(KEEP=PACF1) PREFIX=PACF;
73
74 /* Comparison of theoretical and empirical partial autocorrelation */
75 DATA compareplot&p&q;
76 MERGE plotdata&p&q corrseasad(KEEP=partcorr LAG);
316 The Box–Jenkins Program: A Case Study
In all cases there exists a great agreement between the theoretical and
empirical parts. Hence, there is no reason to doubt the validity of any
of the underlying models.
Examination of Residuals
We suppose that the adjusted time series ỹ1 , . . . ỹn was generated by
an invertible ARMA(p, q)-model
for t = 1, . . . n, where ε̂t and ỹt are set equal to zero for t ≤ 0.
Under the model assumptions they show an approximate white noise
behavior and therefore, their empirical autocorrelation function r̂ε̂ (k)
(2.24) will be close to zero for k > 0 not too large. Note that above
estimation of the empirical autocorrelation coefficients becomes less
reliable as k → n.
The following two figures show the empirical autocorrelation functions
of the estimated residuals up to lag 100 based on the ARMA(3, 2)-
model (7.56) and ARMA(2, 3)-model (7.57), respectively. Their pat-
tern indicate uncorrelated residuals in both cases, confirming the ad-
equacy of the chosen models.
1 /* donauwoerth_dcheck2.sas*/
2 TITLE1 ’Residual Analysis’;
3 TITLE2 ’Donauwoerth Data’;
4
5 /* Note that this program requires ’seasad’ and ’donau’ generated by
,→the program donauwoerth_adjustment.sas */
6
7 /* Preparations */
8 DATA donau2;
9 MERGE donau seasad(KEEP=SA);
10
11 /* Test for white noise */
12 PROC ARIMA DATA=donau2;
13 IDENTIFY VAR=sa NLAG=60 NOPRINT;
14 ESTIMATE METHOD=CLS p=3 q=2 NOPRINT;
15 FORECAST LEAD=0 OUT=forecast1 NOPRINT;
16 ESTIMATE METHOD=CLS p=2 q=3 NOPRINT;
17 FORECAST LEAD=0 OUT=forecast2 NOPRINT;
18
19 /* Compute empirical autocorrelations of estimated residuals */
20 PROC ARIMA DATA=forecast1;
21 IDENTIFY VAR=residual NLAG=150 OUTCOV=cov1 NOPRINT;
22 RUN;
23
24 PROC ARIMA DATA=forecast2;
25 IDENTIFY VAR=residual NLAG=150 OUTCOV=cov2 NOPRINT;
26 RUN;
7.6 Diagnostic Check 319
27
28 /* Graphical options */
29 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5;
30 AXIS1 LABEL=(ANGLE=90 ’Residual Autocorrelation’);
31 AXIS2 LABEL=(’Lag’) ORDER=(0 TO 150 BY 10);
32
33 /* Plot empirical autocorrelation function of estimated residuals */
34 PROC GPLOT DATA=cov1;
35 PLOT corr*LAG=1 / VREF=0 VAXIS=AXIS1 HAXIS=AXIS2;
36 RUN;
37
38 PROC GPLOT DATA=cov2;
39 PLOT corr*LAG=1 / VREF=0 VAXIS=AXIS1 HAXIS=AXIS2;
40 RUN; QUIT;
One-step forecasts of the adjusted series are written together with the estimated resid-
ỹ1 , . . . ỹn are computed by means of the uals into the data file notated after the option
FORECAST statement in the ARIMA procedure. OUT. Finally, the empirical autocorrelations of
LEAD=0 prevents future forecasts beyond the the estimated residuals are computed by PROC
sample size 7300. The corresponding forecasts ARIMA and plotted by PROC GPLOT.
To Chi- Pr >
Lag Square DF ChiSq --------------Autocorrelations-----------------
To Chi- Pr >
Lag Square DF ChiSq --------------Autocorrelations-----------------
7 /* Test preparation */
8 DATA donau2;
9 MERGE donau seasad(KEEP=SA);
10
11 /* Test for white noise */
12 PROC ARIMA DATA=donau2;
13 IDENTIFY VAR=sa NLAG=60 NOPRINT;
14 ESTIMATE METHOD=CLS p=3 q=2 WHITENOISE=IGNOREMISS;
15 FORECAST LEAD=0 OUT=out32(KEEP=residual RENAME=residual=res32);
16 ESTIMATE METHOD=CLS p=2 q=3 WHITENOISE=IGNOREMISS;
17 FORECAST LEAD=0 OUT=out23(KEEP=residual RENAME=residual=res23);
18 RUN; QUIT;
The Portmanteau-test of Box–Ljung is initiated residuals are also written in the files out32 and
by the WHITENOISE=IGNOREMISS option in out23 for further use.
the ESTIMATE statement of PROC ARIMA. The
For both models, the Listings 7.6.3 and 7.6.3 indicate sufficiently great
p-values for the Box–Ljung test statistic up to lag 54, letting us pre-
sume that the residuals are representations of a white noise process.
7.6 Diagnostic Check 321
Overfitting
The third diagnostic class is the method of overfitting. To verify the
fitting of an ARMA(p, q)-model, slightly more comprehensive models
with additional parameters are estimated, usually an ARMA(p, q +
1)- and ARMA(p + 1, q)-model. It is then expected that these new
additional parameters will be close to zero, if the initial ARMA(p, q)-
model is adequate.
The original ARMA(p, q)-model should be considered hazardous or
critical, if one of the following aspects occur in the more comprehen-
sive models:
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
SAS computes the model coefficients for the chosen orders (3, 3), (2, 4)
and (4, 2) by means of the conditional least squares method and pro-
vides the ARMA(3, 3)-model
Yt − 1.57265Yt−1 + 0.54576Yt−2 + 0.03884Yt−3
= εt − 0.29901εt−1 − 0.37478εt−2 − 0.12273εt−3 ,
the ARMA(4, 2)-model
Yt − 2.07048Yt−1 + 1.46595Yt−2 − 0.47350Yt−3 + 0.08461Yt−4
= εt − 0.79694εt−1 − 0.08912εt−2
as well as the ARMA(2, 4)-model
Yt − 1.63540Yt−1 + 0.64655Yt−2
= εt − 0.36178εt−1 − 0.35398εt−2 − 0.10115εt−3 + 0.0069054εt−4 .
324 The Box–Jenkins Program: A Case Study
since the ’old’ parameters in the ARMA(4, 2)- and ARMA(3, 3)-model
differ extremely from those inherent in the above ARMA(3, 2)-model
and the new added parameter −0.12273 in the ARMA(3, 3)-model is
not close enough to zero, as can be seen by the corresponding p-value.
Whereas the adjusted ARMA(2, 3)-model
Yt = − 1.61943Yt−1 + 0.63143Yt−2
= εt − 0.34553εt−1 − 0.34821εt−2 − 0.10327εt−3 (7.59)
7.7 Forecasting
To justify forecasts based on the apparently adequate ARMA(2, 3)-
model (7.59), we have to verify its forecasting ability. It can be
achieved by either ex-ante or ex-post best one-step forecasts of the
sample Y1 , . . . ,Yn . Latter method already has been executed in the
residual examination step. We have shown that the estimated forecast
errors ε̂t := ỹt − ŷt , t = 1, . . . , n can be regarded as a realization of a
white noise process. The ex-ante forecasts are based on the first n−m
observations ỹ1 , . . . , ỹn−m , i.e., we adjust an ARMA(2, 3)-model to the
reduced time series ỹ1 , . . . , ỹn−m . Afterwards, best one-step forecasts
ŷt of Yt are estimated for t = 1, . . . , n, based on the parameters of this
new ARMA(2, 3)-model.
Now, if the ARMA(2, 3)-model (7.59) is actually adequate, we will
again expect that the estimated forecast errors ε̂t = ỹt − ŷt , t = n −
m + 1, . . . , n behave like realizations of a white noise process, where
7.7 Forecasting 325
M = 3467
Max(P(*)) 44294.13
Sum(P(*)) 9531509
Kappa 16.11159
M = 182
Max(P(*)) 18698.19
Sum(P(*)) 528566
Kappa 6.438309
M-1 2584
Max(P(*)) 24742.39
Sum(P(*)) 6268989
Kappa 10.19851
M-1 2599
Max(P(*)) 39669.63
Sum(P(*)) 6277826
Kappa 16.4231
70 VAR P_01;
71 OUTPUT OUT=psum&k SUM=psum;
72 RUN;
73
74 /* Compute empirical distribution function of cumulated periodogram
,→and its confidence bands */
75 DATA conf&k;
76 SET periodo&k(FIRSTOBS=2);
77 IF _N_=1 THEN SET psum&k;
78 RETAIN s 0;
79 s=s+P_01/psum;
80 fm=_N_/(_FREQ_-1);
81 yu_01=fm+1.63/SQRT(_FREQ_-1);
82 yl_01=fm-1.63/SQRT(_FREQ_-1);
83 yu_05=fm+1.36/SQRT(_FREQ_-1);
84 yl_05=fm-1.36/SQRT(_FREQ_-1);
85
86 /* Graphical options */
87 SYMBOL3 V=NONE I=STEPJ C=GREEN;
88 SYMBOL4 V=NONE I=JOIN C=RED L=2;
89 SYMBOL5 V=NONE I=JOIN C=RED L=1;
90 AXIS5 LABEL=(’x’) ORDER=(.0 TO 1.0 BY .1);
91 AXIS6 LABEL=NONE;
92
93 /* Plot empirical distribution function of cumulated periodogram with
,→its confidence bands */
94 PROC GPLOT DATA=conf&k;
95 PLOT fm*s=3 yu_01*fm=4 yl_01*fm=4 yu_05*fm=5 yl_05*fm=5 / OVERLAY
,→HAXIS=AXIS5 VAXIS=AXIS6;
96 RUN;
97 %MEND;
98
99 %wn(1);
100 %wn(2);
101
102 /* Merge estimated residuals and estimated forecast errors */
103 DATA residuals;
104 MERGE residual1(KEEP=residual) residual2(KEEP=residual RENAME=(
,→residual=forecasterror));
105
106 /* Compute standard deviations */
107 PROC MEANS DATA=residuals;
108 VAR residual forecasterror;
109 RUN;
110
111 %wn(3);
112 %wn(4);
113
114 QUIT;
332 The Box–Jenkins Program: A Case Study
In the first DATA step, the adjusted discharge of the residuals and forecast errors. The resid-
values pertaining to the discharge values mea- uals coincide with the one-step forecasts errors
sured during the last year are removed from of the first 6935 observations, whereas the fore-
the sample. In the framework of the proce- cast errors are given by one-step forecasts er-
dure PROC ARIMA, an ARMA(2, 3)-model is rors of the last 365 observations. Visual repre-
adjusted to the reduced sample. The cor- sentation is created by PROC GPLOT.
responding estimated model coefficients and The macro-step then executes Fisher’s test for
further model details are written into the file hidden periodicities as well as the Bartlett–
’model’, by the option OUTMODEL. Best one- Kolmogorov–Smirnov test for both residuals
step forecasts of the ARMA(2, 3)-model are and forecast errors within the procedure PROC
then computed by PROC ARIMA again, where SPECTRA. After renaming, they are written into
the model orders, model coefficients and model a single file. PROC MEANS computes their stan-
mean are specified in the ESTIMATE state- dard deviations for a direct comparison.
ment, where the option NOEST suppresses a Afterwards tests for white noise are computed
new model estimation. for other numbers of estimated residuals by the
The following steps deal with the examination macro.
The ex-ante forecasts are computed from the dataset without the
last 365 observations. Plot 7.7.1b of the estimated forecast errors
ε̂t = ỹt − ŷt , t = n − m + 1, . . . , n indicates an accidental pattern.
Applying Fisher’s test as well as Bartlett-Kolmogorov-Smirnov’s test
for white noise, as introduced in Section 6.1, the p-value of 0.0944
and Fisher’s κ-statistic of 6.438309 do not reject the hypothesis that
the estimated forecast errors are generated from a white noise pro-
cess (εt )t∈Z , where the εt are independent and identically normal dis-
tributed (Listing 7.7.1e). Whereas, testing the estimated residuals for
white noise behavior yields a conflicting result. Bartlett-Kolmogorov-
Smirnov’s test clearly doesn’t reject the hypothesis that the estimated
residuals are generated from a white noise process (εt )t∈Z , where the
εt are independent and identically normal distributed, due to the
great p-value of 0.9978 (Listing 7.7.1c). On the other hand, the
conservative test of Fisher produces an irritating, high κ-statistic of
16.11159, which would lead to a rejection of the hypothesis. Further
examinations with Fisher’s test lead to the κ-statistic of 10.19851,
when testing the first 5170 estimated residuals, and it leads to the
κ-statistic of 16.4231, when testing the first 5200 estimated residu-
als (Listing 7.7.1h and 7.7.1i). Applying the result of Exercise 6.12
P {κm > x} ≈ 1 − exp(−m e−x ), we obtain the approximate p-values
of 0.09 and < 0.001, respectively. That is, Fisher’s test doesn’t reject
the hypothesis for the first 5170 estimated residuals but it clearly re-
7.7 Forecasting 333
jects the hypothesis for the first 5200 ones. This would imply that
the additional 30 values cause such a great periodical influence such
that a white noise behavior is strictly rejected. In view of the small
number of 30, relatively to 5170, Plot 7.7.1 and the great p-value
0.9978 of Bartlett-Kolmogorov-Smirnov’s test we can’t trust Fisher’s
κ-statistic. Recall, as well, that the Portmanteau-test of Box–Ljung
has not rejected the hypothesis that the 7300 estimated residuals re-
sult from a white noise, carried out in Section 7.6. Finally, Listing
7.7.1g displays that the standard deviations 37.1 and 38.1 of the esti-
mated residuals and estimated forecast errors, respectively, are close
to each other.
Hence, the validity of the ARMA(2, 3)-model (7.59) has been estab-
lished, justifying us to predict future values using best h-step fore-
casts Ŷn+h based on this model. Since the model isP invertible, it can
be almost surely rewritten as AR(∞)-process Yt = u>0 cu Yt−u + εt .
Consequently,
Ph−1 the best h-step forecast Ŷn+h of Yn+h can be estimated
Pt+h−1
by ŷn+h = u=1 cu ŷn+h−u + v=h cv ỹt+h−v , cf. Theorem 2.3.4, where
the estimated forecasts ŷn+i , i = 1, . . . , h are computed recursively.
(o)
An estimated best h-step forecast ŷn+h of the original Donauwoerth
time series y1 , . . . , yn is then obviously given by
(o) (365) (2433)
ŷn+h := ŷn+h + µ + Ŝn+h + Ŝn+h , (7.60)
(365)
where µ denotes the arithmetic mean of y1 , . . . , yn and where Ŝn+h
(2433)
and Ŝn+h are estimations of the seasonal nonrandom components
at lag n + h pertaining to the cycles with periods of 365 and 2433
days, respectively. These estimations already have been computed
by Program 7.4.2. Note that we can only make reasonable forecasts
for the next few days, as ŷn+h approaches its expectation zero for
increasing h.
334 The Box–Jenkins Program: A Case Study
1 /* donauwoerth_final.sas */
2 TITLE1 ;
3 TITLE2 ;
4
5 /* Note that this program requires the file ’seasad’ and ’seasad1’
,→generated by program donauwoerth_adjustment.sas */
6
7 /* Computations of forecasts for next 31 days */
8 PROC ARIMA DATA=seasad;
9 IDENTIFY VAR=sa NLAG=300 NOPRINT;
10 ESTIMATE METHOD=CLS p=2 q=3 OUTMODEL=model NOPRINT;
11 FORECAST LEAD=31 OUT=forecast;
12 RUN;
13
14 /* Plot preparations */
15 DATA forecast(DROP=residual sa);
16 SET forecast(FIRSTOBS=7301);
17 N=_N_;
18 date=MDY(1,N,2005);
19 FORMAT date ddmmyy10.;
20
28
29
30 DATA plotforecast;
31 SET donau seascomp;
32
33 /* Graphical display */
34 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.5;
35 SYMBOL2 V=DOT I=JOIN H=0.4 C=BLUE W=1 L=1;
36 AXIS1 LABEL=NONE ORDER=(’01OCT04’d ’01NOV04’d ’01DEC04’d ’01JAN05’d
,→’01FEB05’d);
37 AXIS2 LABEL=(ANGLE=90 ’Original series with forecasts’) ORDER=(0 to
,→300 by 100);
38 LEGEND1 LABEL=NONE VALUE=(’Original series’ ’Forecasts’ ’Lower 95-
,→percent confidence limit’ ’Upper 95-percent confidence limit’);
39
40 PROC GPLOT DATA=plotforecast(FIRSTOBS=7209);
41 PLOT discharge*date=1 forecast*date=2 / OVERLAY HAXIS=AXIS1 VAXIS=
,→AXIS2 LEGEND=LEGEND1;
42 RUN; QUIT;
The best h-step forecasts for the next month forecasts of the original time series are at-
are computed within PROC ARIMA by the tained. The procedure PROC GPLOT then cre-
FORECAST statement and the option LEAD. ates a graphical visualization of the best h-step
Adding the corresponding seasonal compo- forecasts.
nents and the arithmetic mean (see (7.60)),
Exercises
7.1. Show the following generalization of Lemma 7.3.1: Consider two
sequences (Yn )n∈N and (Xn )n∈N of real valued random k-vectors,
all defined on the same probability space (Ω, A, P) and a sequence
336 The Box–Jenkins Program: A Case Study
(hn )n∈N of positive real valued numbers, such that Yn = O P (hn ) and
Xn = oP (1). Then, Yn Xn = oP (hn ).
7.3. (IQ Data) Apply the Box–Jenkins program to the IQ Data. This
dataset contains a monthly index of quality ranging from 0 to 100.
Bibliography
Akaike, H. (1977). On entropy maximization principle. Applications
of Statistics, Amsterdam, pages 27–41.
Shiskin, J., Young, A., and Musgrave, J. (1967). The x–11 variant of
census method ii seasonal adjustment program. Technical paper 15,
Bureau of the Census, U.S. Dept. of Commerce.
Tintner, G. (1958). Eine neue methode für die schätzung der logistis-
chen funktion. Metrika, 1:154–157.
Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
http://fsf.org/
Everyone is permitted to copy and distribute verbatim copies of this license document, but
changing it is not allowed.
a Secondary Section may not explain any math- machine-generated HTML, PostScript or PDF
ematics.) The relationship could be a matter of produced by some word processors for output
historical connection with the subject or with purposes only.
related matters, or of legal, commercial, philo- The “Title Page” means, for a printed book,
sophical, ethical or political position regarding the title page itself, plus such following pages
them. as are needed to hold, legibly, the material this
The “Invariant Sections” are certain Sec- License requires to appear in the title page. For
ondary Sections whose titles are designated, as works in formats which do not have any title
being those of Invariant Sections, in the notice page as such, “Title Page” means the text near
that says that the Document is released under the most prominent appearance of the work’s ti-
this License. If a section does not fit the above tle, preceding the beginning of the body of the
definition of Secondary then it is not allowed to text.
be designated as Invariant. The Document may The “publisher” means any person or entity
contain zero Invariant Sections. If the Docu- that distributes copies of the Document to the
ment does not identify any Invariant Sections public.
then there are none. A section “Entitled XYZ” means a named
subunit of the Document whose title either
The “Cover Texts” are certain short passages
is precisely XYZ or contains XYZ in paren-
of text that are listed, as Front-Cover Texts or
theses following text that translates XYZ in
Back-Cover Texts, in the notice that says that
another language. (Here XYZ stands for a
the Document is released under this License. A
specific section name mentioned below, such
Front-Cover Text may be at most 5 words, and
as “Acknowledgements”, “Dedications”,
a Back-Cover Text may be at most 25 words.
“Endorsements”, or “History”.) To
A “Transparent” copy of the Document means “Preserve the Title” of such a section when
a machine-readable copy, represented in a for- you modify the Document means that it re-
mat whose specification is available to the gen- mains a section “Entitled XYZ” according to
eral public, that is suitable for revising the doc- this definition.
ument straightforwardly with generic text edi- The Document may include Warranty Dis-
tors or (for images composed of pixels) generic claimers next to the notice which states that
paint programs or (for drawings) some widely this License applies to the Document. These
available drawing editor, and that is suitable for Warranty Disclaimers are considered to be in-
input to text formatters or for automatic trans- cluded by reference in this License, but only as
lation to a variety of formats suitable for input regards disclaiming warranties: any other im-
to text formatters. A copy made in an other- plication that these Warranty Disclaimers may
wise Transparent file format whose markup, or have is void and has no effect on the meaning of
absence of markup, has been arranged to thwart this License.
or discourage subsequent modification by read-
ers is not Transparent. An image format is not 2. VERBATIM COPYING
Transparent if used for any substantial amount
of text. A copy that is not “Transparent” is You may copy and distribute the Document
called “Opaque”. in any medium, either commercially or non-
Examples of suitable formats for Transparent commercially, provided that this License, the
copies include plain ASCII without markup, copyright notices, and the license notice say-
Texinfo input format, LaTeX input format, ing this License applies to the Document are
SGML or XML using a publicly available reproduced in all copies, and that you add no
DTD, and standard-conforming simple HTML, other conditions whatsoever to those of this Li-
PostScript or PDF designed for human modifi- cense. You may not use technical measures to
cation. Examples of transparent image formats obstruct or control the reading or further copy-
include PNG, XCF and JPG. Opaque formats ing of the copies you make or distribute. How-
include proprietary formats that can be read ever, you may accept compensation in exchange
and edited only by proprietary word processors, for copies. If you distribute a large enough num-
SGML or XML for which the DTD and/or pro- ber of copies you must also follow the conditions
cessing tools are not generally available, and the in section 3.
GNU Free Documentation Licence 353
You may also lend copies, under the same condi- You may copy and distribute a Modified Ver-
tions stated above, and you may publicly display sion of the Document under the conditions of
copies. sections 2 and 3 above, provided that you re-
lease the Modified Version under precisely this
3. COPYING IN QUANTITY License, with the Modified Version filling the
role of the Document, thus licensing distribu-
If you publish printed copies (or copies in me- tion and modification of the Modified Version
dia that commonly have printed covers) of the to whoever possesses a copy of it. In addition,
Document, numbering more than 100, and the you must do these things in the Modified Ver-
Document’s license notice requires Cover Texts, sion:
you must enclose the copies in covers that carry,
clearly and legibly, all these Cover Texts: Front- A. Use in the Title Page (and on the covers, if
Cover Texts on the front cover, and Back-Cover any) a title distinct from that of the Doc-
Texts on the back cover. Both covers must also ument, and from those of previous versions
clearly and legibly identify you as the publisher (which should, if there were any, be listed in
of these copies. The front cover must present the the History section of the Document). You
full title with all words of the title equally promi- may use the same title as a previous ver-
nent and visible. You may add other material sion if the original publisher of that version
on the covers in addition. Copying with changes gives permission.
limited to the covers, as long as they preserve B. List on the Title Page, as authors, one or
the title of the Document and satisfy these con- more persons or entities responsible for au-
ditions, can be treated as verbatim copying in thorship of the modifications in the Modi-
other respects. fied Version, together with at least five of
If the required texts for either cover are too vo- the principal authors of the Document (all
luminous to fit legibly, you should put the first of its principal authors, if it has fewer than
ones listed (as many as fit reasonably) on the ac- five), unless they release you from this re-
tual cover, and continue the rest onto adjacent quirement.
pages. C. State on the Title page the name of the
If you publish or distribute Opaque copies of the publisher of the Modified Version, as the
Document numbering more than 100, you must publisher.
either include a machine-readable Transparent D. Preserve all the copyright notices of the
copy along with each Opaque copy, or state in Document.
or with each Opaque copy a computer-network E. Add an appropriate copyright notice for
location from which the general network-using your modifications adjacent to the other
public has access to download using public- copyright notices.
standard network protocols a complete Trans- F. Include, immediately after the copyright
parent copy of the Document, free of added ma- notices, a license notice giving the public
terial. If you use the latter option, you must permission to use the Modified Version un-
take reasonably prudent steps, when you begin der the terms of this License, in the form
distribution of Opaque copies in quantity, to en- shown in the Addendum below.
sure that this Transparent copy will remain thus G. Preserve in that license notice the full lists
accessible at the stated location until at least of Invariant Sections and required Cover
one year after the last time you distribute an Texts given in the Document’s license no-
Opaque copy (directly or through your agents tice.
or retailers) of that edition to the public. H. Include an unaltered copy of this License.
It is requested, but not required, that you con- I. Preserve the section Entitled “History”,
tact the authors of the Document well before re- Preserve its Title, and add to it an item
distributing any large number of copies, to give stating at least the title, year, new authors,
them a chance to provide you with an updated and publisher of the Modified Version as
version of the Document. given on the Title Page. If there is no sec-
tion Entitled “History” in the Document,
create one stating the title, year, authors,
4. MODIFICATIONS and publisher of the Document as given on
354 GNU Free Documentation Licence
its Title Page, then add an item describ- Only one passage of Front-Cover Text and one of
ing the Modified Version as stated in the Back-Cover Text may be added by (or through
previous sentence. arrangements made by) any one entity. If the
J. Preserve the network location, if any, given Document already includes a cover text for the
in the Document for public access to a same cover, previously added by you or by ar-
Transparent copy of the Document, and rangement made by the same entity you are act-
likewise the network locations given in ing on behalf of, you may not add another; but
the Document for previous versions it was you may replace the old one, on explicit permis-
based on. These may be placed in the “His- sion from the previous publisher that added the
tory” section. You may omit a network lo- old one.
cation for a work that was published at least The author(s) and publisher(s) of the Document
four years before the Document itself, or if do not by this License give permission to use
the original publisher of the version it refers their names for publicity for or to assert or im-
to gives permission. ply endorsement of any Modified Version.
K. For any section Entitled “Acknowledge-
ments” or “Dedications”, Preserve the Title 5. COMBINING DOCUMENTS
of the section, and preserve in the section
all the substance and tone of each of the You may combine the Document with other doc-
contributor acknowledgements and/or ded- uments released under this License, under the
ications given therein. terms defined in section 4 above for modified
L. Preserve all the Invariant Sections of the versions, provided that you include in the com-
Document, unaltered in their text and in bination all of the Invariant Sections of all of the
their titles. Section numbers or the equiv- original documents, unmodified, and list them
alent are not considered part of the section all as Invariant Sections of your combined work
titles. in its license notice, and that you preserve all
M. Delete any section Entitled “Endorse- their Warranty Disclaimers.
ments”. Such a section may not be included The combined work need only contain one copy
in the Modified Version. of this License, and multiple identical Invariant
N. Do not retitle any existing section to be En- Sections may be replaced with a single copy. If
titled “Endorsements” or to conflict in title there are multiple Invariant Sections with the
with any Invariant Section. same name but different contents, make the ti-
O. Preserve any Warranty Disclaimers. tle of each such section unique by adding at the
end of it, in parentheses, the name of the origi-
If the Modified Version includes new front- nal author or publisher of that section if known,
matter sections or appendices that qualify as or else a unique number. Make the same adjust-
Secondary Sections and contain no material ment to the section titles in the list of Invariant
copied from the Document, you may at your op- Sections in the license notice of the combined
tion designate some or all of these sections as in- work.
variant. To do this, add their titles to the list of In the combination, you must combine any sec-
Invariant Sections in the Modified Version’s li- tions Entitled “History” in the various original
cense notice. These titles must be distinct from documents, forming one section Entitled “His-
any other section titles. tory”; likewise combine any sections Entitled
You may add a section Entitled “Endorse- “Acknowledgements”, and any sections Entitled
ments”, provided it contains nothing but en- “Dedications”. You must delete all sections En-
dorsements of your Modified Version by various titled “Endorsements”.
parties—for example, statements of peer review
or that the text has been approved by an or- 6. COLLECTIONS OF DOCUMENTS
ganization as the authoritative definition of a
standard. You may make a collection consisting of the
You may add a passage of up to five words Document and other documents released under
as a Front-Cover Text, and a passage of up to this License, and replace the individual copies
25 words as a Back-Cover Text, to the end of of this License in the various documents with
the list of Cover Texts in the Modified Version. a single copy that is included in the collection,
GNU Free Documentation Licence 355
provided that you follow the rules of this License If a section in the Document is Entitled “Ac-
for verbatim copying of each of the documents knowledgements”, “Dedications”, or “History”,
in all other respects. the requirement (section 4) to Preserve its Ti-
You may extract a single document from such a tle (section 1) will typically require changing the
collection, and distribute it individually under actual title.
this License, provided you insert a copy of this
9. TERMINATION
License into the extracted document, and fol-
low this License in all other respects regarding You may not copy, modify, sublicense, or dis-
verbatim copying of that document. tribute the Document except as expressly pro-
vided under this License. Any attempt other-
7. AGGREGATION WITH wise to copy, modify, sublicense, or distribute it
INDEPENDENT WORKS is void, and will automatically terminate your
rights under this License.
A compilation of the Document or its derivatives However, if you cease all violation of this Li-
with other separate and independent documents cense, then your license from a particular copy-
or works, in or on a volume of a storage or distri- right holder is reinstated (a) provisionally, un-
bution medium, is called an “aggregate” if the less and until the copyright holder explicitly and
copyright resulting from the compilation is not finally terminates your license, and (b) perma-
used to limit the legal rights of the compilation’s nently, if the copyright holder fails to notify you
users beyond what the individual works permit. of the violation by some reasonable means prior
When the Document is included in an aggregate, to 60 days after the cessation.
this License does not apply to the other works in Moreover, your license from a particular copy-
the aggregate which are not themselves deriva- right holder is reinstated permanently if the
tive works of the Document. copyright holder notifies you of the violation by
If the Cover Text requirement of section 3 is ap- some reasonable means, this is the first time you
plicable to these copies of the Document, then have received notice of violation of this License
if the Document is less than one half of the en- (for any work) from that copyright holder, and
tire aggregate, the Document’s Cover Texts may you cure the violation prior to 30 days after your
be placed on covers that bracket the Document receipt of the notice.
within the aggregate, or the electronic equiva- Termination of your rights under this section
lent of covers if the Document is in electronic does not terminate the licenses of parties who
form. Otherwise they must appear on printed have received copies or rights from you under
covers that bracket the whole aggregate. this License. If your rights have been termi-
nated and not permanently reinstated, receipt
8. TRANSLATION of a copy of some or all of the same material
does not give you any rights to use it.
Translation is considered a kind of modification,
so you may distribute translations of the Doc-
10. FUTURE REVISIONS OF THIS
ument under the terms of section 4. Replac-
LICENSE
ing Invariant Sections with translations requires The Free Software Foundation may publish new,
special permission from their copyright holders, revised versions of the GNU Free Documenta-
but you may include translations of some or tion License from time to time. Such new ver-
all Invariant Sections in addition to the orig- sions will be similar in spirit to the present ver-
inal versions of these Invariant Sections. You sion, but may differ in detail to address new
may include a translation of this License, and problems or concerns. See http://www.gnu.
all the license notices in the Document, and any org/copyleft/.
Warranty Disclaimers, provided that you also Each version of the License is given a distin-
include the original English version of this Li- guishing version number. If the Document spec-
cense and the original versions of those notices ifies that a particular numbered version of this
and disclaimers. In case of a disagreement be- License “or any later version” applies to it, you
tween the translation and the original version of have the option of following the terms and con-
this License or a notice or disclaimer, the origi- ditions either of that specified version or of any
nal version will prevail. later version that has been published (not as a
356 GNU Free Documentation Licence
draft) by the Free Software Foundation. If the the same site at any time before August 1, 2009,
Document does not specify a version number of provided the MMC is eligible for relicensing.
this License, you may choose any version ever
published (not as a draft) by the Free Software ADDENDUM: How to use this License
Foundation. If the Document specifies that a for your documents
proxy can decide which future versions of this
License can be used, that proxy’s public state-
To use this License in a document you have writ-
ment of acceptance of a version permanently au-
ten, include a copy of the License in the docu-
thorizes you to choose that version for the Doc-
ment and put the following copyright and license
ument.
notices just after the title page:
11. RELICENSING
Copyright © YEAR YOUR NAME. Per-
“Massive Multiauthor Collaboration Site” (or
mission is granted to copy, distribute
“MMC Site”) means any World Wide Web
and/or modify this document under the
server that publishes copyrightable works and
terms of the GNU Free Documentation
also provides prominent facilities for anybody
License, Version 1.3 or any later ver-
to edit those works. A public wiki that anybody
sion published by the Free Software Foun-
can edit is an example of such a server. A “Mas-
dation; with no Invariant Sections, no
sive Multiauthor Collaboration” (or “MMC”)
Front-Cover Texts, and no Back-Cover
contained in the site means any set of copy-
Texts. A copy of the license is included
rightable works thus published on the MMC
in the section entitled “GNU Free Docu-
site.
mentation License”.
“CC-BY-SA” means the Creative Commons
Attribution-Share Alike 3.0 license published by
Creative Commons Corporation, a not-for-profit If you have Invariant Sections, Front-Cover
corporation with a principal place of business in Texts and Back-Cover Texts, replace the “with
San Francisco, California, as well as future copy- . . . Texts.” line with this:
left versions of that license published by that
same organization. with the Invariant Sections being LIST
“Incorporate” means to publish or republish a THEIR TITLES, with the Front-Cover
Document, in whole or in part, as part of an- Texts being LIST, and with the Back-
other Document. Cover Texts being LIST.
An MMC is “eligible for relicensing” if it is li-
censed under this License, and if all works that If you have Invariant Sections without Cover
were first published under this License some- Texts, or some other combination of the three,
where other than this MMC, and subsequently merge those two alternatives to suit the situa-
incorporated in whole or in part into the MMC, tion.
(1) had no cover texts or invariant sections, and If your document contains nontrivial examples
(2) were thus incorporated prior to November 1, of program code, we recommend releasing these
2008. examples in parallel under your choice of free
The operator of an MMC Site may republish an software license, such as the GNU General Pub-
MMC contained in the site under CC-BY-SA on lic License, to permit their use in free software.