2011 March 01 Times PDF

A First Course on
Time Series Analysis

Examples with SAS
Chair of Statistics, University of Würzburg

March 20, 2011
A First Course on
Time Series Analysis — Examples with SAS
by Chair of Statistics, University of Würzburg.
Version 2011.March.01
Copyright © 2011 Michael Falk.
Editors Michael Falk, Frank Marohn, René Michel, Daniel Hof-

mann, Maria Macke, Christoph Spachmann, Stefan
Englert
Programs Bernward Tewes, René Michel, Daniel Hofmann,
Christoph Spachmann, Stefan Englert
Layout and Design Peter Dinges, Stefan Englert
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.3 or any later version
published by the Free Software Foundation; with no Invariant Sections, no Front-
Cover Texts, and no Back-Cover Texts. A copy of the license is included in the
section entitled ”GNU Free Documentation License”.
SAS and all other SAS Institute Inc. product or service names are registered trade-
marks or trademarks of SAS Institute Inc. in the USA and other countries. Windows
is a trademark, Microsoft is a registered trademark of the Microsoft Corporation.
The authors accept no responsibility for errors in the programs mentioned of their
consequences.
Preface
The analysis of real data by means of statistical methods with the aid
of a software package common in industry and administration usually
is not an integral part of mathematics studies, but it will certainly be
part of a future professional work.
The practical need for an investigation of time series data is exempli-
fied by the following plot, which displays the yearly sunspot numbers
between 1749 and 1924. These data are also known as the Wolf or
Wölfer (a student of Wolf) Data. For a discussion of these data and
further literature we refer to Wei and Reilly (1989), Example 6.2.5.
Plot 1: Sunspot data
The present book links up elements from time series analysis with a se-
lection of statistical procedures used in general practice including the
iv
statistical software package SAS (Statistical Analysis System). Conse-

quently this book addresses students of statistics as well as students of
other branches such as economics, demography and engineering, where
lectures on statistics belong to their academic training. But it is also
intended for the practician who, beyond the use of statistical tools, is
interested in their mathematical background. Numerous problems il-
lustrate the applicability of the presented statistical procedures, where
SAS gives the solutions. The programs used are explicitly listed and
explained. No previous experience is expected neither in SAS nor in a
special computer system so that a short training period is guaranteed.
This book is meant for a two semester course (lecture, seminar or
practical training) where the first three chapters can be dealt with
in the first semester. They provide the principal components of the
analysis of a time series in the time domain. Chapters 4, 5 and 6
deal with its analysis in the frequency domain and can be worked
through in the second term. In order to understand the mathematical
background some terms are useful such as convergence in distribution,
stochastic convergence, maximum likelihood estimator as well as a
basic knowledge of the test theory, so that work on the book can start
after an introductory lecture on stochastics. Each chapter includes
exercises. An exhaustive treatment is recommended. Chapter 7 (case
study) deals with a practical case and demonstrates the presented
methods. It is possible to use this chapter independent in a seminar
or practical training course, if the concepts of time series analysis are
already well understood.
Due to the vast field a selection of the subjects was necessary. Chap-
ter 1 contains elements of an exploratory time series analysis, in-
cluding the fit of models (logistic, Mitscherlich, Gompertz curve)
to a series of data, linear filters for seasonal and trend adjustments
(difference filters, Census X–11 Program) and exponential filters for
monitoring a system. Autocovariances and autocorrelations as well
as variance stabilizing techniques (Box–Cox transformations) are in-
troduced. Chapter 2 provides an account of mathematical models
of stationary sequences of random variables (white noise, moving
averages, autoregressive processes, ARIMA models, cointegrated se-
quences, ARCH- and GARCH-processes) together with their math-
ematical background (existence of stationary processes, covariance
v
generating function, inverse and causal filters, stationarity condition,

Yule–Walker equations, partial autocorrelation). The Box–Jenkins
program for the specification of ARMA-models is discussed in detail
(AIC, BIC and HQ information criterion). Gaussian processes and
maximum likelihod estimation in Gaussian models are introduced as
well as least squares estimators as a nonparametric alternative. The
diagnostic check includes the Box–Ljung test. Many models of time
series can be embedded in state-space models, which are introduced in
Chapter 3. The Kalman filter as a unified prediction technique closes
the analysis of a time series in the time domain. The analysis of a
series of data in the frequency domain starts in Chapter 4 (harmonic
waves, Fourier frequencies, periodogram, Fourier transform and its
inverse). The proof of the fact that the periodogram is the Fourier
transform of the empirical autocovariance function is given. This links
the analysis in the time domain with the analysis in the frequency do-
main. Chapter 5 gives an account of the analysis of the spectrum of
the stationary process (spectral distribution function, spectral den-
sity, Herglotz’s theorem). The effects of a linear filter are studied
(transfer and power transfer function, low pass and high pass filters,
filter design) and the spectral densities of ARMA-processes are com-
puted. Some basic elements of a statistical analysis of a series of data
in the frequency domain are provided in Chapter 6. The problem of
testing for a white noise is dealt with (Fisher’s κ-statistic, Bartlett–
Kolmogorov–Smirnov test) together with the estimation of the spec-
tral density (periodogram, discrete spectral average estimator, kernel
estimator, confidence intervals). Chapter 7 deals with the practical
application of the Box–Jenkins Program to a real dataset consisting of
7300 discharge measurements from the Donau river at Donauwoerth.
For the purpose of studying, the data have been kindly made avail-
able to the University of Würzburg. A special thank is dedicated to
Rudolf Neusiedl. Additionally, the asymptotic normality of the partial
and general autocorrelation estimators is proven in this chapter and
some topics discussed earlier are further elaborated (order selection,
diagnostic check, forecasting).
This book is consecutively subdivided in a statistical part and a SAS-
specific part. For better clearness the SAS-specific part, including
the diagrams generated with SAS, is between two horizontal bars,
vi
separating it from the rest of the text.
1 /* This is a sample comment. */

2 /* The first comment in each program will be its name. */
3
4 Program code will be set in typewriter-font. SAS keywords like DATA or
5 PROC will be set in bold.
6
7 Also all SAS keywords are written in capital letters. This is not
8 necessary as SAS code is not case sensitive, but it makes it easier to
9 read the code.
10
11 Extra-long lines will be broken into smaller lines with continuation
,→marked by an arrow and indentation.
12 (Also, the line-number is missing in this case.)
In this area, you will find a step-by-step expla- that SAS cannot be explained as a whole this
nation of the above program. The keywords way. Only the actually used commands will be
will be set in typewriter-font. Please note mentioned.
Contents
1 Elements of Exploratory Time Series Analysis 1
1.1 The Additive Model for a Time Series . . . . . . . . . 2
1.2 Linear Filtering of Time Series . . . . . . . . . . . . 16
1.3 Autocovariances and Autocorrelations . . . . . . . . 35
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2 Models of Time Series 47

2.1 Linear Filters and Stochastic Processes . . . . . . . 47
2.2 Moving Averages and Autoregressive Processes . . 61
2.3 The Box–Jenkins Program . . . . . . . . . . . . . . . 99
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3 State-Space Models 121

3.1 The State-Space Representation . . . . . . . . . . . 121
3.2 The Kalman-Filter . . . . . . . . . . . . . . . . . . . 125
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4 The Frequency Domain Approach of a Time Series 135

4.1 Least Squares Approach with Known Frequencies . 136
4.2 The Periodogram . . . . . . . . . . . . . . . . . . . . 142
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
viii Contents
5 The Spectrum of a Stationary Process 159

5.1 Characterizations of Autocovariance Functions . . . 160
5.2 Linear Filters and Frequencies . . . . . . . . . . . . 166
5.3 Spectral Density of an ARMA-Process . . . . . . . . 175
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6 Statistical Analysis in the Frequency Domain 187

6.1 Testing for a White Noise . . . . . . . . . . . . . . . 187
6.2 Estimating Spectral Densities . . . . . . . . . . . . . 196
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7 The Box–Jenkins Program: A Case Study 223

7.1 Partial Correlation and Levinson–Durbin Recursion . 224
7.2 Asymptotic Normality of Partial Autocorrelation Esti-
mator . . . . . . . . . . . . . . . . . . . . . . . . . . 234
7.3 Asymptotic Normality of Autocorrelation Estimator . 259
7.4 First Examinations . . . . . . . . . . . . . . . . . . . 272
7.5 Order Selection . . . . . . . . . . . . . . . . . . . . . 284
7.6 Diagnostic Check . . . . . . . . . . . . . . . . . . . . 311
7.7 Forecasting . . . . . . . . . . . . . . . . . . . . . . . 324
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Bibliography 337
Index 341
SAS-Index 348
GNU Free Documentation Licence 351

Chapter
Elements of Exploratory
Time Series Analysis
A time series is a sequence of observations that are arranged according
1
to the time of their outcome. The annual crop yield of sugar-beets and
their price per ton for example is recorded in agriculture. The newspa-
pers’ business sections report daily stock prices, weekly interest rates,
monthly rates of unemployment and annual turnovers. Meteorology
records hourly wind speeds, daily maximum and minimum tempera-
tures and annual rainfall. Geophysics is continuously observing the
shaking or trembling of the earth in order to predict possibly impend-
ing earthquakes. An electroencephalogram traces brain waves made
by an electroencephalograph in order to detect a cerebral disease, an
electrocardiogram traces heart waves. The social sciences survey an-
nual death and birth rates, the number of accidents in the home and
various forms of criminal activities. Parameters in a manufacturing
process are permanently monitored in order to carry out an on-line
inspection in quality assurance.
There are, obviously, numerous reasons to record and to analyze the
data of a time series. Among these is the wish to gain a better under-
standing of the data generating mechanism, the prediction of future
values or the optimal control of a system. The characteristic property
of a time series is the fact that the data are not generated indepen-
dently, their dispersion varies in time, they are often governed by a
trend and they have cyclic components. Statistical procedures that
suppose independent and identically distributed data are, therefore,
excluded from the analysis of time series. This requires proper meth-
ods that are summarized under time series analysis.
2 Elements of Exploratory Time Series Analysis
1.1 The Additive Model for a Time Series

The additive model for a given time series y1 , . . . , yn is the assump-
tion that these data are realizations of random variables Yt that are
themselves sums of four components
Yt = Tt + Zt + St + Rt , t = 1, . . . , n. (1.1)
where Tt is a (monotone) function of t, called trend , and Zt reflects

some nonrandom long term cyclic influence. Think of the famous
business cycle usually consisting of recession, recovery, growth, and
decline. St describes some nonrandom short term cyclic influence like
a seasonal component whereas Rt is a random variable grasping all
the deviations from the ideal non-stochastic model yt = Tt + Zt + St .
The variables Tt and Zt are often summarized as
Gt = Tt + Zt , (1.2)
describing the long term behavior of the time series. We suppose in

the following that the expectation E(Rt ) of the error variable exists
and equals zero, reflecting the assumption that the random deviations
above or below the nonrandom model balance each other on the av-
erage. Note that E(Rt ) = 0 can always be achieved by appropriately
modifying one or more of the nonrandom components.
Example 1.1.1. (Unemployed1 Data). The following data yt , t =
1, . . . , 51, are the monthly numbers of unemployed workers in the
building trade in Germany from July 1975 to September 1979.
MONTH T UNEMPLYD
July 1 60572
August 2 52461
September 3 47357
October 4 48320
November 5 60219
December 6 84418
January 7 119916
February 8 124350
March 9 87309
1.1 The Additive Model for a Time Series 3
April 10 57035
May 11 39903
June 12 34053
July 13 29905
August 14 28068
September 15 26634
October 16 29259
November 17 38942
December 18 65036
January 19 110728
February 20 108931
March 21 71517
April 22 54428
May 23 42911
June 24 37123
July 25 33044
August 26 30755
September 27 28742
October 28 31968
November 29 41427
December 30 63685
January 31 99189
February 32 104240
March 33 75304
April 34 43622
May 35 33990
June 36 26819
July 37 25291
August 38 24538
September 39 22685
October 40 23945
November 41 28245
December 42 47017
January 43 90920
February 44 89340
March 45 47792
April 46 28448
May 47 19139
June 48 16728
July 49 16523
August 50 16622
September 51 15499
Listing 1.1.1: Unemployed1 Data.
1 /* unemployed1_listing.sas */
2 TITLE1 ’Listing’;
3 TITLE2 ’Unemployed1 Data’;
4
5 /* Read in the data (Data-step) */
6 DATA data1;
7 INFILE ’c:\data\unemployed1.txt’;
8 INPUT month $ t unemplyd;
9
10 /* Print the data (Proc-step) */
11 PROC PRINT DATA = data1 NOOBS;
12 RUN;QUIT;
This program consists of two main parts, a variables to be printed out, ’dress up’ of the dis-
DATA and a PROC step. play etc. The SAS internal observation number
The DATA step started with the DATA statement (OBS) is printed by default, NOOBS suppresses
creates a temporary dataset named data1. the column of observation numbers on each
The purpose of INFILE is to link the DATA step line of output. An optional VAR statement deter-
to a raw dataset outside the program. The path- mines the order (from left to right) in which vari-
name of this dataset depends on the operat- ables are displayed. If not specified (like here),
ing system; we will use the syntax of MS-DOS, all variables in the data set will be printed in the
which is most commonly known. INPUT tells order they were defined to SAS. Entering RUN;
SAS how to read the data. Three variables are at any point of the program tells SAS that a unit
defined here, where the first one contains char- of work (DATA step or PROC) ended. SAS then
acter values. This is determined by the $ sign stops reading the program and begins to exe-
behind the variable name. For each variable cute the unit. The QUIT; statement at the end
one value per line is read from the source into terminates the processing of SAS.
the computer’s memory. A line starting with an asterisk * and ending
The statement PROC procedurename with a semicolon ; is ignored. These comment
DATA=filename; invokes a procedure that statements may occur at any point of the pro-
is linked to the data from filename. Without gram except within raw data or another state-
the option DATA=filename the most recently ment.
created file is used. The TITLE statement generates a title. Its
The PRINT procedure lists the data; it comes printing is actually suppressed here and in the
with numerous options that allow control of the following.
The following plot of the Unemployed1 Data shows a seasonal compo-

nent and a downward trend. The period from July 1975 to September
1979 might be too short to indicate a possibly underlying long term
business cycle.
Plot 1.1.2: Unemployed1 Data.
1 /* unemployed1_plot.sas */
2 TITLE1 ’Plot’;
4
5 /* Read in the data */
6 DATA data1;
8 INPUT month $ t unemplyd;
9
10 /* Graphical Options */
11 AXIS1 LABEL=(ANGLE=90 ’unemployed’);
12 AXIS2 LABEL=(’t’);
13 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.4 W=1;
14
15 /* Plot the data */
16 PROC GPLOT DATA=data1;
17 PLOT unemplyd*t / VAXIS=AXIS1 HAXIS=AXIS2;
18 RUN; QUIT;
Variables can be plotted by using the GPLOT in which the data are displayed. V=DOT
procedure, where the graphical output is con- C=GREEN I=JOIN H=0.4 W=1 tell SAS to
trolled by numerous options. plot green dots of height 0.4 and to join
The AXIS statements with the LABEL options them with a line of width 1. The PLOT
control labelling of the vertical and horizontal statement in the GPLOT procedure is of
axes. ANGLE=90 causes a rotation of the label the form PLOT y-variable*x-variable /
of 90◦ so that it parallels the (vertical) axis in options;, where the options here define the
this example. horizontal and the vertical axes.
The SYMBOL statement defines the manner
Models with a Nonlinear Trend

In the additive model Yt = Tt +Rt , where the nonstochastic component
is only the trend Tt reflecting the growth of a system, and assuming
E(Rt ) = 0, we have
E(Yt ) = Tt =: f (t).
A common assumption is that the function f depends on several (un-
known) parameters β1 , . . . , βp , i.e.,
f (t) = f (t; β1 , . . . , βp ). (1.3)
However, the type of the function f is known. The unknown param-

eters β1 , . . . ,βp are then to be estimated from the set of realizations
yt of the random variables Yt . A common approach is a least squares
estimate β̂1 , . . . , β̂p satisfying
X 2 X 2
yt − f (t; β̂1 , . . . , β̂p ) = min yt − f (t; β1 , . . . , βp ) , (1.4)
β1 ,...,βp
t t
whose computation, if it exists at all, is a numerical problem. The

value ŷt := f (t; β̂1 , . . . , β̂p ) can serve as a prediction of a future yt .
The observed differences yt − ŷt are called residuals. They contain
information about the goodness of the fit of our model to the data.
In the following we list several popular examples of trend functions.
The Logistic Function

The function
β3
flog (t) := flog (t; β1 , β2 , β3 ) := , t ∈ R, (1.5)
1 + β2 exp(−β1 t)
with β1 , β2 , β3 ∈ R \ {0} is the widely used logistic function.
Plot 1.1.3: The logistic function flog with different values of β1 , β2 , β3
1 /* logistic.sas */
2 TITLE1 ’Plots of the Logistic Function’;
3
4 /* Generate the data for different logistic functions */
5 DATA data1;
6 beta3=1;
7 DO beta1= 0.5, 1;
8 DO beta2=0.1, 1;
9 DO t=-10 TO 10 BY 0.5;
10 s=COMPRESS(’(’ || beta1 || ’,’ || beta2 || ’,’ || beta3 || ’)’);
11 f_log=beta3/(1+beta2*EXP(-beta1*t));
12 OUTPUT;
13 END;
14 END;
15 END;
16
18 SYMBOL1 C=GREEN V=NONE I=JOIN L=1;
22 AXIS1 LABEL=(H=2 ’f’ H=1 ’log’ H=2 ’(t)’);
24 LEGEND1 LABEL=(F=CGREEK H=2 ’(b’ H=1 ’1’ H=2 ’, b’ H=1 ’2’ H=2 ’,b’ H
,→=1 ’3’ H=2 ’)=’);
25
26 /* Plot the functions */


28 PLOT f_log*t=s / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
29 RUN; QUIT;
A function is plotted by computing its values at The four functions are plotted by the GPLOT
numerous grid points and then joining them. procedure by adding =s in the PLOT statement.
The computation is done in the DATA step, This also automatically generates a legend,
where the data file data1 is generated. It con- which is customized by the LEGEND1 state-
tains the values of f log, computed at the grid ment. Here the label is modified by using a
t = −10, −9.5, . . . , 10 and indexed by the vec- greek font (F=CGREEK) and generating smaller
tor s of the different choices of parameters. letters of height 1 for the indices, while assum-
This is done by nested DO loops. The opera- ing a normal height of 2 (H=1 and H=2). The
tor || merges two strings and COMPRESS re- last feature is also used in the axis statement.
moves the empty space in the string. OUTPUT For each value of s SAS takes a new SYMBOL
then stores the values of interest of f log, t statement. They generate lines of different line
and s (and the other variables) in the data set types (L=1, 2, 3, 33).
data1.
We obviously have limt→∞ flog (t) = β3 , if β1 > 0. The value β3 often

resembles the maximum impregnation or growth of a system. Note
that
1 1 + β2 exp(−β1 t)
=
flog (t) β3
1 − exp(−β1 ) 1 + β2 exp(−β1 (t − 1))
= + exp(−β1 )
β3 β3
1 − exp(−β1 ) 1
= + exp(−β1 )
β3 flog (t − 1)
b
=a+ . (1.6)
flog (t − 1)
This means that there is a linear relationship among 1/flog (t). This
can serve as a basis for estimating the parameters β1 , β2 , β3 by an
appropriate linear least squares approach, see Exercises 1.2 and 1.3.
In the following example we fit the logistic trend model (1.5) to the
population growth of the area of North Rhine-Westphalia (NRW),
which is a federal state of Germany.
Example 1.1.2. (Population1 Data). Table 1.1.1 shows the popu-

lation sizes yt in millions of the area of North-Rhine-Westphalia in
5 years steps from 1935 to 1980 as well as their predicted values ŷt ,
obtained from a least squares estimation as described in (1.4) for a
logistic model.
Year t Population sizes yt Predicted values ŷt

(in millions) (in millions)
1935 1 11.772 10.930
1940 2 12.059 11.827
1945 3 11.200 12.709
1950 4 12.926 13.565
1955 5 14.442 14.384
1960 6 15.694 15.158
1965 7 16.661 15.881
1970 8 16.914 16.548
1975 9 17.176 17.158
1980 10 17.044 17.710
Table 1.1.1: Population1 Data
As a prediction of the population size at time t we obtain in the logistic

model
β̂3
ŷt :=
1 + β̂2 exp(−β̂1 t)
21.5016
=
1 + 1.1436 exp(−0.1675 t)
with the estimated saturation size β̂3 = 21.5016. The following plot
shows the data and the fitted logistic curve.
Plot 1.1.4: NRW population sizes and fitted logistic function.
1 /* population1.sas */
2 TITLE1 ’Population sizes and logistic fit’;
3 TITLE2 ’Population1 Data’;
4
6 DATA data1;
7 INFILE ’c:\data\population1.txt’;
8 INPUT year t pop;
9
10 /* Compute parameters for fitted logistic function */
11 PROC NLIN DATA=data1 OUTEST=estimate;
12 MODEL pop=beta3/(1+beta2*EXP(-beta1*t));
13 PARAMETERS beta1=1 beta2=1 beta3=20;
14 RUN;
15
16 /* Generate fitted logistic function */
17 DATA data2;
18 SET estimate(WHERE=(_TYPE_=’FINAL’));
19 DO t1=0 TO 11 BY 0.2;
20 f_log=beta3/(1+beta2*EXP(-beta1*t1));
21 OUTPUT;
22 END;
23
24 /* Merge data sets */

25 DATA data3;
26 MERGE data1 data2;
27
28 /* Graphical options */
29 AXIS1 LABEL=(ANGLE=90 ’population in millions’);
31 SYMBOL1 V=DOT C=GREEN I=NONE;
32 SYMBOL2 V=NONE C=GREEN I=JOIN W=1;
33
34 /* Plot data with fitted function */
36 PLOT pop*t=1 f_log*t1=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
37 RUN; QUIT;
The procedure NLIN fits nonlinear regression rameter. Using the final estimates of PROC
models by least squares. The OUTEST option NLIN by the SET statement in combination with
names the data set to contain the parameter the WHERE data set option, the second data
estimates produced by NLIN. The MODEL state- step generates the fitted logistic function val-
ment defines the prediction equation by declar- ues. The options in the GPLOT statement cause
ing the dependent variable and defining an ex- the data points and the predicted function to be
pression that evaluates predicted values. A shown in one plot, after they were stored to-
PARAMETERS statement must follow the PROC gether in a new data set data3 merging data1
NLIN statement. Each parameter=value ex- and data2 with the MERGE statement.
pression specifies the starting values of the pa-
The Mitscherlich Function

The Mitscherlich function is typically used for modelling the long
term growth of a system:
fM (t) := fM (t; β1 , β2 , β3 ) := β1 + β2 exp(β3 t), t ≥ 0, (1.7)
where β1 , β2 ∈ R and β3 < 0. Since β3 is negative we have the

asymptotic behavior limt→∞ fM (t) = β1 and thus the parameter β1 is
the saturation value of the system. The (initial) value of the system
at the time t = 0 is fM (0) = β1 + β2 .
The Gompertz Curve

A further quite common function for modelling the increase or de-
crease of a system is the Gompertz curve
fG (t) := fG (t; β1 , β2 , β3 ) := exp(β1 + β2 β3t ), t ≥ 0, (1.8)
where β1 , β2 ∈ R and β3 ∈ (0, 1).

Plot 1.1.5: Gompertz curves with different parameters.
1 /* gompertz.sas */
2 TITLE1 ’Gompertz curves’;
3
4 /* Generate the data for different Gompertz functions */
5 DATA data1;
6 beta1=1;
7 DO beta2=-1, 1;
8 DO beta3=0.05, 0.5;
9 DO t=0 TO 4 BY 0.05;
10 s=COMPRESS(’(’ || beta1 || ’,’ || beta2 || ’,’ || beta3 || ’)’);
11 f_g=EXP(beta1+beta2*beta3**t);
12 OUTPUT;
13 END;
14 END;
15 END;
16
22 AXIS1 LABEL=(H=2 ’f’ H=1 ’G’ H=2 ’(t)’);
24 LEGEND1 LABEL=(F=CGREEK H=2 ’(b’ H=1 ’1’ H=2 ’,b’ H=1 ’2’ H=2 ’,b’ H=1
,→ ’3’ H=2 ’)=’);
25
26 /*Plot the functions */

28 PLOT f_g*t=s / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
29 RUN; QUIT;
We obviously have
log(fG (t)) = β1 + β2 β3t = β1 + β2 exp(log(β3 )t),
and thus log(fG ) is a Mitscherlich function with parameters β1 , β2 and

log(β3 ). The saturation size obviously is exp(β1 ).
The Allometric Function

The allometric function
fa (t) := fa (t; β1 , β2 ) = β2 tβ1 , t ≥ 0, (1.9)
with β1 ∈ R, β2 > 0, is a common trend function in biometry and

economics. It can be viewed as a particular Cobb–Douglas function,
which is a popular econometric model to describe the output produced
by a system depending on an input. Since
log(fa (t)) = log(β2 ) + β1 log(t), t > 0,
is a linear function of log(t), with slope β1 and intercept log(β2 ), we

can assume a linear regression model for the logarithmic data log(yt )
log(yt ) = log(β2 ) + β1 log(t) + εt , t ≥ 1,
where εt are the error variables.
Example 1.1.3. (Income Data). Table 1.1.2 shows the (accumulated)

annual average increases of gross and net incomes in thousands DM
(deutsche mark) in Germany, starting in 1960.
Year t Gross income xt Net income yt

1960 0 0 0
1961 1 0.627 0.486
1962 2 1.247 0.973
1963 3 1.702 1.323
1964 4 2.408 1.867
1965 5 3.188 2.568
1966 6 3.866 3.022
1967 7 4.201 3.259
1968 8 4.840 3.663
1969 9 5.855 4.321
1970 10 7.625 5.482
Table 1.1.2: Income Data.
We assume that the increase of the net income yt is an allometric

function of the time t and obtain
log(yt ) = log(β2 ) + β1 log(t) + εt . (1.10)
The least squares estimates of β1 and log(β2 ) in the above linear re-
gression model are (see, for example Falk et al., 2002, Theorem 3.2.2)
P10
(log(t) − log(t))(log(yt ) − log(y))
β̂1 = t=1 P10 = 1.019,
(log(t) − log(t))2
t=1
1
P10 1
P10
where log(t) := 10 t=1 log(t) = 1.5104, log(y) := 10 t=1 log(yt ) =
0.7849, and hence
\2 ) = log(y) − β̂1 log(t) = −0.7549
log(β
We estimate β2 therefore by
β̂2 = exp(−0.7549) = 0.4700.
The predicted value ŷt corresponds to the time t
ŷt = 0.47t1.019 . (1.11)

t yt − ŷt
1 0.0159
2 0.0201
3 -0.1176
4 -0.0646
5 0.1430
6 0.1017
7 -0.1583
8 -0.2526
9 -0.0942
10 0.5662
Table 1.1.3: Residuals of Income Data.
Table 1.1.3 lists the residuals yt − ŷt by which one can judge the
goodness of fit of the model (1.11).
A popular measure for assessing the fit is the squared multiple corre-
lation coefficient or R2 -value
Pn
(yt − ŷt )2
R2 := 1 − Pt=1n 2
(1.12)
t=1 (y t − ȳ)
where ȳ := n−1 nt=1 yt is the average of the observations yt (cf Falk

P
et al., 2002, Section 3.3). In the linear regression model with ŷt based
on the least squares estimates of the parameters, R2 is necessarily
Pn
between zero and one with the implications R2 = 1 iff1 t=1 (yt −
2 2
ŷt ) = 0 (see Exercise 1.4). A value of R close to 1 is in favor of
the fitted model. The model (1.10) has R2 equal to 0.9934, whereas
(1.11) has R2 = 0.9789. Note, however, that the initial model (1.9) is
not linear and β̂2 is not the least squares estimates, in which case R2
is no longer necessarily between zero and one and has therefore to be
viewed with care as a crude measure of fit.
The annual average gross income in 1960 was 6148 DM and the cor-
responding net income was 5178 DM. The actual average gross and
net incomes were therefore x̃t := xt + 6.148 and ỹt := yt + 5.178 with
1
if and only if
the estimated model based on the above predicted values ŷt
ỹˆt = ŷt + 5.178 = 0.47t1.019 + 5.178.
Note that the residuals ỹt − ỹˆt = yt − ŷt are not influenced by adding
the constant 5.178 to yt . The above models might help judging the
average tax payer’s situation between 1960 and 1970 and to predict
his future one. It is apparent from the residuals in Table 1.1.3 that
the net income yt is an almost perfect multiple of t for t between 1
and 9, whereas the large increase y10 in 1970 seems to be an outlier.
Actually, in 1969 the German government had changed and in 1970 a
long strike in Germany caused an enormous increase in the income of
civil servants.
1.2 Linear Filtering of Time Series

In the following we consider the additive model (1.1) and assume that
there is no long term cyclic component. Nevertheless, we allow a
trend, in which case the smooth nonrandom component Gt equals the
trend function Tt . Our model is, therefore, the decomposition
Yt = Tt + St + Rt , t = 1, 2, . . . (1.13)
with E(Rt ) = 0. Given realizations yt , t = 1, 2, . . . , n, of this time

series, the aim of this section is the derivation of estimators T̂t , Ŝt
of the nonrandom functions Tt and St and to remove them from the
time series by considering yt − T̂t or yt − Ŝt instead. These series are
referred to as the trend or seasonally adjusted time series. The data
yt are decomposed in smooth parts and irregular parts that fluctuate
around zero.
Linear Filters
Let a−r , a−r+1 , . . . , as be arbitrary real numbers, where r, s ≥ 0, r +
s + 1 ≤ n. The linear transformation
s
X
Yt∗ := au Yt−u , t = s + 1, . . . , n − r,
u=−r
1.2 Linear Filtering of Time Series 17
is referred to as a linear filter with weights a−r , . . . , as . The Yt are

called input and the Yt∗ are called output.
Obviously, there are less output data than input data, if (r, s) 6=
(0, 0). A positive value s > 0 or r > 0 causes a truncation at the
beginning or at the end of the time series; see Example 1.2.2 below.
For convenience, we call the vector of weights (au ) = (a−r , . . . , as )T a
(linear) filter.
A filter (au ), whose weights sum up to one, su=−r au = 1, is called
P
moving average. The particular cases au = 1/(2s + 1), u = −s, . . . , s,
with an odd number of equal weights, or au = 1/(2s), u = −s +
1, . . . , s − 1, a−s = as = 1/(4s), aiming at an even number of weights,
are simple moving averages of order 2s + 1 and 2s, respectively.
Filtering a time series aims at smoothing the irregular part of a time
series, thus detecting trends or seasonal components, which might
otherwise be covered by fluctuations. While for example a digital
speedometer in a car can provide its instantaneous velocity, thereby
showing considerably large fluctuations, an analog instrument that
comes with a hand and a built-in smoothing filter, reduces these fluc-
tuations but takes a while to adjust. The latter instrument is much
more comfortable to read and its information, reflecting a trend, is
sufficient in most cases.
To compute the output of a simple moving average of order 2s + 1,
the following obvious equation is useful:
∗ 1
Yt+1 = Yt∗ + (Yt+s+1 − Yt−s ).
2s + 1
This filter is a particular example of a low-pass filter, which preserves
the slowly varying trend component of a series but removes from it the
rapidly fluctuating or high frequency component. There is a trade-off
between the two requirements that the irregular fluctuation should be
reduced by a filter, thus leading, for example, to a large choice of s in
a simple moving average, and that the long term variation in the data
should not be distorted by oversmoothing, i.e., by a too large choice
of s. If we assume, for example, a time series Yt = Tt + Rt without
seasonal component, a simple moving average of order 2s + 1 leads to

s
∗ 1 X
Yt = Yt−u
2s + 1 u=−s
s s
1 X 1 X
= Tt−u + Rt−u =: Tt∗ + Rt∗ ,
2s + 1 u=−s 2s + 1 u=−s
where by some law of large numbers argument
Rt∗ ∼ E(Rt ) = 0,
if s is large. But Tt∗ might then no longer reflect Tt . A small choice of
s, however, has the effect that Rt∗ is not yet close to its expectation.
Example 1.2.1. (Unemployed Females Data). The series of monthly
unemployed females between ages 16 and 19 in the United States
from January 1961 to December 1985 (in thousands) is smoothed by
a simple moving average of order 17. The data together with their
smoothed counterparts can be seen in Figure 1.2.1.
Plot 1.2.1: Unemployed young females in the US and the smoothed

values (simple moving average of order 17).
1 /* females.sas */
2 TITLE1 ’Simple Moving Average of Order 17’;
3 TITLE2 ’Unemployed Females Data’;
4
5 /* Read in the data and generate SAS-formatted date */
6 DATA data1;
7 INFILE ’c:\data\female.txt’;
8 INPUT upd @@;
9 date=INTNX(’month’,’01jan61’d, _N_-1);
10 FORMAT date yymon.;
11
12 /* Compute the simple moving averages of order 17 */
13 PROC EXPAND DATA=data1 OUT=data2 METHOD=NONE;
14 ID date;
15 CONVERT upd=ma17 / TRANSFORM=(CMOVAVE 17);
16
18 AXIS1 LABEL=(ANGLE=90 ’Unemployed Females’);
19 AXIS2 LABEL=(’Date’);
20 SYMBOL1 V=DOT C=GREEN I=JOIN H=.5 W=1;
21 SYMBOL2 V=STAR C=GREEN I=JOIN H=.5 W=1;
22 LEGEND1 LABEL=NONE VALUE=(’Original data’ ’Moving average of order
,→17’);
23
24 /* Plot the data together with the simple moving average */
26 PLOT upd*date=1 ma17*date=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2
,→LEGEND=LEGEND1;
27
28 RUN; QUIT;
In the data step the values for the variable upd wish to do this here, we choose METHOD=NONE.
are read from an external file. The option @@ The ID variable specifies the time index, in our
allows SAS to read the data over line break in case the date, by which the observations are
the original txt-file. ordered. The CONVERT statement now com-
By means of the function INTNX, a new vari- putes the simple moving average. The syntax is
able in a date format is generated, containing original=smoothed variable. The smooth-
monthly data starting from the 1st of January ing method is given in the TRANSFORM option.
1961. The temporarily created variable N , CMOVAVE number specifies a simple moving
which counts the number of cases, is used to average of order number. Remark that for the
determine the distance from the starting value. values at the boundary the arithmetic mean of
The FORMAT statement attributes the format the data within the moving window is computed
yymon to this variable, consisting of four digits as the simple moving average. This is an ex-
for the year and three for the month. tension of our definition of a simple moving av-
The SAS procedure EXPAND computes simple erage. Also other smoothing methods can be
moving averages and stores them in the file specified in the TRANSFORM statement like the
specified in the OUT= option. EXPAND is also exponential smoother with smooting parameter
able to interpolate series. For example if one alpha (see page 33ff.) by EWMA alpha.
has a quaterly series and wants to turn it into The smoothed values are plotted together with
monthly data, this can be done by the method the original series against the date in the final
stated in the METHOD= option. Since we do not step.
Seasonal Adjustment
A simple moving average of a time series Yt = Tt + St + Rt now
decomposes as
Yt∗ = Tt∗ + St∗ + Rt∗ ,
where St∗ is the pertaining moving average of the seasonal components.
Suppose, moreover, that St is a p-periodic function, i.e.,
St = St+p , t = 1, . . . , n − p.
Take for instance monthly average temperatures Yt measured at fixed
points, in which case it is reasonable to assume a periodic seasonal
component St with period p = 12 months. A simple moving average
of order p then yields a constant value St∗ = S, t = p, p + 1, . . . , n − p.
By adding this constant S to the trend function Tt and putting Tt0 :=
Tt + S, we can assume in the following that S = 0. Thus we obtain
for the differences
Dt := Yt − Yt∗ ∼ St + Rt .
To estimate St we average the differences with lag p (note that they
vary around St ) by
nt −1
1 X
D̄t := Dt+jp ∼ St , t = 1, . . . , p,
nt j=0
D̄t := D̄t−p for t > p,
where nt is the number of periods available for the computation of D̄t .
Thus,
p p
1X 1X
Ŝt := D̄t − D̄j ∼ St − Sj = St (1.14)
p j=1 p j=1
is an estimator of St = St+p = St+2p = . . . satisfying
p−1 p−1
1X 1X
Ŝt+j = 0 = St+j .
p j=0 p j=0
The differences Yt − Ŝt with a seasonal component close to zero are

then the seasonally adjusted time series.
Example 1.2.2. For the 51 Unemployed1 Data in Example 1.1.1 it
is obviously reasonable to assume a periodic seasonal component with
p = 12 months. A simple moving average of order 12
5
1 1 X 1
Yt∗ = Yt−6 + Yt−u + Yt+6 , t = 7, . . . , 45,
12 2 u=−5
2
then has a constant seasonal component, which we assume to be zero

by adding this constant to the trend function. Table 1.2.1 contains
the values of Dt , D̄t and the estimates Ŝt of St .
dt (rounded values)
Month d¯t (rounded) ŝt (rounded)
1976 1977 1978 1979
January 53201 56974 48469 52611 52814 53136
February 59929 54934 54102 51727 55173 55495
March 24768 17320 25678 10808 19643 19966
April -3848 42 -5429 – -3079 -2756
May -19300 -11680 -14189 – -15056 -14734
June -23455 -17516 -20116 – -20362 -20040
July -26413 -21058 -20605 – -22692 -22370
August -27225 -22670 -20393 – -23429 -23107
September -27358 -24646 -20478 – -24161 -23839
October -23967 -21397 -17440 – -20935 -20612
November -14300 -10846 -11889 – -12345 -12023
December 11540 12213 7923 – 10559 10881
Table 1.2.1: Table of dt , d¯t and of estimates ŝt of the seasonal compo-
nent St in the Unemployed1 Data.
We obtain for these data

12
¯ 1 X¯ 3867
ŝt = dt − dj = d¯t + = d¯t + 322.25.
12 j=1 12
Example 1.2.3. (Temperatures Data). The monthly average tem-

peratures near Würzburg, Germany were recorded from the 1st of
January 1995 to the 31st of December 2004. The data together with
their seasonally adjusted counterparts can be seen in Figure 1.2.2.
Plot 1.2.2: Monthly average temperatures near Würzburg and sea-

sonally adjusted values.
1 /* temperatures.sas */
2 TITLE1 ’Original and seasonally adjusted data’;
3 TITLE2 ’Temperatures data’;
4
5 /* Read in the data and generate SAS-formatted date */
6 DATA temperatures;
7 INFILE ’c:\data\temperatures.txt’;
8 INPUT temperature;
9 date=INTNX(’month’,’01jan95’d,_N_-1);
11
12 /* Make seasonal adjustment */
13 PROC TIMESERIES DATA=temperatures OUT=series SEASONALITY=12 OUTDECOMP=
,→deseason;
14 VAR temperature;
15 DECOMP /MODE=ADD;
16
17 /* Merge necessary data for plot */
18 DATA plotseries;
19 MERGE temperatures deseason(KEEP=SA);
20
22 AXIS1 LABEL=(ANGLE=90 ’temperatures’);
23 AXIS2 LABEL=(’Date’);
24 SYMBOL1 V=DOT C=GREEN I=JOIN H=1 W=1;

25 SYMBOL2 V=STAR C=GREEN I=JOIN H=1 W=1;
26
27 /* Plot data and seasonally adjusted series */

28 PROC GPLOT data=plotseries;
29 PLOT temperature*date=1 SA*date=2 /OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
30
31 RUN; QUIT;
In the data step the values for the variable default is a multiplicative model. The original
temperature are read from an external file. series together with an automated time variable
By means of the function INTNX, a date vari- (just a counter) is stored in the file specified in
able is generated, see Program 1.2.1 (fe- the OUT option. In the option SEASONALITY
males.sas). the underlying period is specified. Depending
The SAS procedure TIMESERIES together with on the data it can be any natural number.
the statement DECOMP computes a seasonally
adjusted series, which is stored in the file after The seasonally adjusted values can be refer-
the OUTDECOMP option. With MODE=ADD an ad- enced by SA and are plotted together with the
ditive model of the time series is assumed. The original series against the date in the final step.
The Census X–11 Program

In the fifties of the 20th century the U.S. Bureau of the Census has
developed a program for seasonal adjustment of economic time series,
called the Census X–11 Program. It is based on monthly observations
and assumes an additive model
Yt = Tt + St + Rt
as in (1.13) with a seasonal component St of period p = 12. We give a

brief summary of this program following Wallis (1974), which results
in a moving average with symmetric weights. The census procedure
is discussed in Shiskin and Eisenpress (1957); a complete description
is given by Shiskin et al. (1967). A theoretical justification based on
stochastic models is provided by Cleveland and Tiao (1976)).
The X–11 Program essentially works as the seasonal adjustment de-
scribed above, but it adds iterations and various moving averages.
The different steps of this program are
(i) Compute a simple moving average Yt∗ of order 12 to leave es-

sentially a trend Yt∗ ∼ Tt .
(ii) The difference

Dt := Yt − Yt∗ ∼ St + Rt
then leaves approximately the seasonal plus irregular compo-
nent.
(iii) Apply a moving average of order 5 to each month separately by

computing
(1) 1 (1) (1) (1) (1) (1)

D̄t := D + 2Dt−12 + 3Dt + 2Dt+12 + Dt+24 ∼ St ,
9 t−24
which gives an estimate of the seasonal component St . Note
that the moving average with weights (1, 2, 3, 2, 1)/9 is a simple
moving average of length 3 of simple moving averages of length
3.
(1)
(iv) The D̄t are adjusted to approximately sum up to 0 over any
12-months period by putting
(1) (1) 1 1 (1) (1) (1) 1 (1)

Ŝt := D̄t − D̄ + D̄t−5 + · · · + D̄t+5 + D̄t+6 .
12 2 t−6 2
(v) The differences

(1) (1)
Yt := Yt − Ŝt ∼ Tt + Rt
then are the preliminary seasonally adjusted series, quite in the

manner as before.
(1)
(vi) The adjusted data Yt are further smoothed by a Henderson
moving average Yt∗∗ of order 9, 13, or 23.
(vii) The differences

(2)
Dt := Yt − Yt∗∗ ∼ St + Rt
then leave a second estimate of the sum of the seasonal and

irregular components.
(viii) A moving average of order 7 is applied to each month separately

3
(2) (2)
X
D̄t := au Dt−12u ,
u=−3
where the weights au come from a simple moving average of

order 3 applied to a simple moving average of order 5 of the
original data, i.e., the vector of weights is (1, 2, 3, 3, 3, 2, 1)/15.
This gives a second estimate of the seasonal component St .
(ix) Step (4) is repeated yielding approximately centered estimates

(2)
Ŝt of the seasonal components.
(x) The differences

(2) (2)
Yt := Yt − Ŝt
then finally give the seasonally adjusted series.
Depending on the length of the Henderson moving average used in step

(2)
(6), Yt is a moving average of length 165, 169 or 179 of the original
data (see Exercise 1.10). Observe that this leads to averages at time
t of the past and future seven years, roughly, where seven years is a
typical length of business cycles observed in economics (Juglar cycle).
The U.S. Bureau of Census has recently released an extended version
of the X–11 Program called Census X–12-ARIMA. It is implemented
in SAS version 8.1 and higher as PROC X12; we refer to the SAS
online documentation for details.
We will see in Example 5.2.4 on page 171 that linear filters may cause
unexpected effects and so, it is not clear a priori how the seasonal
adjustment filter described above behaves. Moreover, end-corrections
are necessary, which cover the important problem of adjusting current
observations. This can be done by some extrapolation.
(2)
Plot 1.2.3: Plot of the Unemployed1 Data yt and of yt , seasonally
adjusted by the X–11 procedure.
1 /* unemployed1_x11.sas */
2 TITLE1 ’Original and X-11 seasonal adjusted data’;
4
5 /* Read in the data and generated SAS-formatted date */
6 DATA data1;
8 INPUT month $ t upd;
9 date=INTNX(’month’,’01jul75’d, _N_-1);
11
12 /* Apply X-11-Program */
13 PROC X11 DATA=data1;
14 MONTHLY DATE=date ADDITIVE;
15 VAR upd;
16 OUTPUT OUT=data2 B1=upd D11=updx11;
17
19 AXIS1 LABEL=(ANGLE=90 ’unemployed’);
20 AXIS2 LABEL=(’Date’) ;
21 SYMBOL1 V=DOT C=GREEN I=JOIN H=1 W=1;
22 SYMBOL2 V=STAR C=GREEN I=JOIN H=1 W=1;
23 LEGEND1 LABEL=NONE VALUE=(’original’ ’adjusted’);
24
25 /* Plot data and adjusted data */


27 PLOT upd*date=1 updx11*date=2
28 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
29 RUN; QUIT;
In the data step values for the variables month, variable upd (unemployed) are stored in a data
t and upd are read from an external file, where set named data2, containing the original data
month is defined as a character variable by in the variable upd and the final results of the
the succeeding $ in the INPUT statement. By X–11 Program in updx11.
means of the function INTNX, a date variable is
generated, see Program 1.2.1 (females.sas). The last part of this SAS program consists of
The SAS procedure X11 applies the Census X– statements for generating the plot. Two AXIS
11 Program to the data. The MONTHLY state- and two SYMBOL statements are used to cus-
ment selects an algorithm for monthly data, tomize the graphic containing two plots, the
DATE defines the date variable and ADDITIVE original data and the by X11 seasonally ad-
selects an additive model (default: multiplica- justed data. A LEGEND statement defines the
tive model). The results for this analysis for the text that explains the symbols.
Best Local Polynomial Fit

A simple moving average works well for a locally almost linear time
series, but it may have problems to reflect a more twisted shape.
This suggests fitting higher order local polynomials. Consider 2k +
1 consecutive data yt−k , . . . , yt , . . . , yt+k from a time series. A local
polynomial estimator of order p < 2k + 1 is the minimizer β0 , . . . , βp
satisfying
k
X
(yt+u − β0 − β1 u − · · · − βp up )2 = min . (1.15)
u=−k
If we differentiate the left hand side with respect to each βj and set
the derivatives equal to zero, we see that the minimizers satisfy the
p + 1 linear equations
k
X k
X k
X k
X
j j+1 j+p
β0 u + β1 u + · · · + βp u = uj yt+u
u=−k u=−k u=−k u=−k
for j = 0, . . . , p. These p + 1 equations, which are called normal

equations, can be written in matrix form as
X T Xβ = X T y (1.16)
where
−k (−k)2 (−k)p
 
1 ...
1 −k + 1 (−k + 1)2 . . . (−k + 1)p 
X=
 ... ... ..  (1.17)
. 
1 k k2 ... kp
is the design matrix , β = (β0 , . . . , βp )T and y = (yt−k , . . . , yt+k )T .

The rank of X T X equals that of X, since their null spaces coincide
(Exercise 1.12). Thus, the matrix X T X is invertible iff the columns
of X are linearly independent. But this is an immediate consequence
of the fact that a polynomial of degree p has at most p different roots
(Exercise 1.13). The normal equations (1.16) have, therefore, the
unique solution
β = (X T X)−1 X T y. (1.18)
The linear prediction of yt+u , based on u, u2 , . . . , up , is
p
X
p
ŷt+u = (1, u, . . . , u )β = βj uj .
j=0
Choosing u = 0 we obtain in particular that β0 = ŷt is a predictor of

the central observation yt among yt−k , . . . , yt+k . The local polynomial
approach consists now in replacing yt by the intercept β0 .
Though it seems as if this local polynomial fit requires a great deal
of computational effort by calculating β0 for each yt , it turns out that
it is actually a moving average. First observe that we can write by
(1.18)
Xk
β0 = cu yt+u
u=−k
with some cu ∈ R which do not depend on the values yu of the time
series and hence, (cu ) is a linear filter. Next we show that the cu sum
up to 1. Choose to this end yt+u = 1 for u = −k, . . . , k. Then β0 = 1,
β1 = · · · = βp = 0 is an obvious solution of the minimization problem
(1.15). Since this solution is unique, we obtain
k
X
1 = β0 = cu
u=−k
and thus, (cu ) is a moving average. As can be seen in Exercise 1.14

it actually has symmetric weights. We summarize our considerations
in the following result.
Theorem 1.2.4. Fitting locally by least squares a polynomial of degree
p to 2k + 1 > p consecutive data points yt−k , . . . , yt+k and predicting yt
by the resulting intercept β0 , leads to a moving average (cu ) of order
2k + 1, given by the first row of the matrix (X T X)−1 X T .
Example 1.2.5. Fitting locally a polynomial of degree 2 to five con-
secutive data points leads to the moving average (Exercise 1.14)
1
(cu ) = (−3, 12, 17, 12, −3)T .
35
An extensive discussion of local polynomial fit is in Kendall and Ord
(1993, Sections 3.2-3.13). For a book-length treatment of local poly-
nomial estimation we refer to Fan and Gijbels (1996). An outline of
various aspects such as the choice of the degree of the polynomial and
further background material is given in Simonoff (1996, Section 5.2).
Difference Filter
We have already seen that we can remove a periodic seasonal compo-
nent from a time series by utilizing an appropriate linear filter. We
will next show that also a polynomial trend function can be removed
by a suitable linear filter.
Lemma 1.2.6. For a polynomial f (t) := c0 + c1 t + · · · + cp tp of degree
p, the difference
∆f (t) := f (t) − f (t − 1)
is a polynomial of degree at most p − 1.
Proof. The assertion is an immediate consequence of the binomial
expansion
p
X p k
(t − 1)p = t (−1)p−k = tp − ptp−1 + · · · + (−1)p .
k
k=0
The preceding lemma shows that differencing reduces the degree of a

polynomial. Hence,
∆2 f (t) := ∆f (t) − ∆f (t − 1) = ∆(∆f (t))
is a polynomial of degree not greater than p − 2, and
∆q f (t) := ∆(∆q−1 f (t)), 1 ≤ q ≤ p,
is a polynomial of degree at most p − q. The function ∆p f (t) is

therefore a constant. The linear filter
∆Yt = Yt − Yt−1
with weights a0 = 1, a1 = −1 is the first order difference filter. The

recursively defined filter
∆p Yt = ∆(∆p−1 Yt ), t = p, . . . , n,
is the difference filter of order p.

The difference filter of second order has, for example, weights a0 =
1, a1 = −2, a2 = 1
∆2 Yt = ∆Yt − ∆Yt−1
= Yt − Yt−1 − Yt−1 + Yt−2 = Yt − 2Yt−1 + Yt−2 .
If a time series Yt has a polynomial trend Tt = pk=0 ck tk for some

P
constants ck , then the difference filter ∆p Yt of order p removes this
trend up to a constant. Time series in economics often have a trend
function that can be removed by a first or second order difference
filter.
Example 1.2.7. (Electricity Data). The following plots show the
total annual output of electricity production in Germany between 1955
and 1979 in millions of kilowatt-hours as well as their first and second
order differences. While the original data show an increasing trend,
the second order differences fluctuate around zero having no more
trend, but there is now an increasing variability visible in the data.
Plot 1.2.4: Annual electricity output, first and second order differ-
ences.
1 /* electricity_differences.sas */
2 TITLE1 ’First and second order differences’;
3 TITLE2 ’Electricity Data’;
4 /* Note that this program requires the macro mkfields.sas to be
,→submitted before this program */
5
6 /* Read in the data, compute moving average of length as 12

7 as well as first and second order differences */
8 DATA data1(KEEP=year sum delta1 delta2);
9 INFILE ’c:\data\electric.txt’;
10 INPUT year t jan feb mar apr may jun jul aug sep oct nov dec;
11 sum=jan+feb+mar+apr+may+jun+jul+aug+sep+oct+nov+dec;
12 delta1=DIF(sum);
13 delta2=DIF(delta1);
14
16 AXIS1 LABEL=NONE;
18
19 /* Generate three plots */
20 GOPTIONS NODISPLAY;
21 PROC GPLOT DATA=data1 GOUT=fig;
22 PLOT sum*year / VAXIS=AXIS1 HAXIS=AXIS2;
23 PLOT delta1*year / VAXIS=AXIS1 VREF=0;
24 PLOT delta2*year / VAXIS=AXIS1 VREF=0;
25 RUN;
26
27 /* Display them in one output */

28 GOPTIONS DISPLAY;
29 PROC GREPLAY NOFS IGOUT=fig TC=SASHELP.TEMPLT;
30 TEMPLATE=V3;
31 TREPLAY 1:GPLOT 2:GPLOT1 3:GPLOT2;
32 RUN; DELETE _ALL_; QUIT;
In the first data step, the raw data are read from window. The subsequent two line mode state-
a file. Because the electric production is stored ments are read instead. The option IGOUT
in different variables for each month of a year, determines the input graphics catalog, while
the sum must be evaluated to get the annual TC=SASHELP.TEMPLT causes SAS to take the
output. Using the DIF function, the resulting standard template catalog. The TEMPLATE
variables delta1 and delta2 contain the first statement selects a template from this catalog,
and second order differences of the original an- which puts three graphics one below the other.
nual sums. The TREPLAY statement connects the defined
To display the three plots of sum, delta1 areas and the plots of the the graphics catalog.
and delta2 against the variable year within GPLOT, GPLOT1 and GPLOT2 are the graphical
one graphic, they are first plotted using the outputs in the chronological order of the GPLOT
procedure GPLOT. Here the option GOUT=fig procedure. The DELETE statement after RUN
stores the plots in a graphics catalog named deletes all entries in the input graphics catalog.
fig, while GOPTIONS NODISPLAY causes no
output of this procedure. After changing the Note that SAS by default prints borders, in or-
GOPTIONS back to DISPLAY, the procedure der to separate the different plots. Here these
GREPLAY is invoked. The option NOFS (no full- border lines are suppressed by defining WHITE
screen) suppresses the opening of a GREPLAY as the border color.
For a time series Yt = Tt + St + Rt with a periodic seasonal component

St = St+p = St+2p = . . . the difference
Yt∗ := Yt − Yt−p
obviously removes the seasonal component. An additional differencing
of proper length can moreover remove a polynomial trend, too. Note
that the order of seasonal and trend adjusting makes no difference.
Exponential Smoother
Let Y0 , . . . , Yn be a time series and let α ∈ [0, 1] be a constant. The
linear filter
Yt∗ = αYt + (1 − α)Yt−1

∗
, t ≥ 1,
with Y0∗ = Y0 is called exponential smoother.
Lemma 1.2.8. For an exponential smoother with constant α ∈ [0, 1]

we have
t−1
X
Yt∗ =α (1 − α)j Yt−j + (1 − α)t Y0 , t = 1, 2, . . . , n.
j=0
Proof. The assertion follows from induction. We have for t = 1 by

definition Y1∗ = αY1 + (1 − α)Y0 . If the assertion holds for t, we obtain
for t + 1
∗
Yt+1 = αYt+1 + (1 − α)Yt∗
X t−1
j t
= αYt+1 + (1 − α) α (1 − α) Yt−j + (1 − α) Y0
j=0
t
X
=α (1 − α)j Yt+1−j + (1 − α)t+1 Y0 .
j=0
The parameter α determines the smoothness of the filtered time se-

ries. A value of α close to 1 puts most of the weight on the actual
observation Yt , resulting in a highly fluctuating series Yt∗ . On the
other hand, an α close to 0 reduces the influence of Yt and puts most
of the weight to the past observations, yielding a smooth series Yt∗ .
An exponential smoother is typically used for monitoring a system.
Take, for example, a car having an analog speedometer with a hand.
It is more convenient for the driver if the movements of this hand are
smoothed, which can be achieved by α close to zero. But this, on the
other hand, has the effect that an essential alteration of the speed can
be read from the speedometer only with a certain delay.
Corollary 1.2.9. (i) Suppose that the random variables Y0 , . . . , Yn

have common expectation µ and common variance σ 2 > 0. Then
we have for the exponentially smoothed variables with smoothing
parameter α ∈ (0, 1)
t−1
X
E(Yt∗ ) =α (1 − α)j µ + µ(1 − α)t
j=0
= µ(1 − (1 − α)t ) + µ(1 − α)t = µ. (1.19)
If the Yt are in addition uncorrelated, then

t−1
X
E((Yt∗ 2
− µ) ) = α 2
(1 − α)2j σ 2 + (1 − α)2t σ 2
j=0
− (1 − α)2t
2 21
=σ α 2
+ (1 − α)2t σ 2
1 − (1 − α)
σ2α
−→t→∞ < σ2. (1.20)
2−α
(ii) Suppose that the random variables Y0 , Y1 , . . . satisfy E(Yt ) = µ

for 0 ≤ t ≤ N − 1, and E(Yt ) = λ for t ≥ N . Then we have for
t≥N
t−N
X t−1
X
E(Yt∗ ) =α j
(1 − α) λ + α (1 − α)j µ + (1 − α)t µ
j=0 j=t−N +1
= λ(1 − (1 − α)t−N +1 )+

t−N +1 N −1 t
µ (1 − α) (1 − (1 − α) ) + (1 − α)
−→t→∞ λ. (1.21)
The preceding result quantifies the influence of the parameter α on

the expectation and on the variance i.e., the smoothness of the filtered
series Yt∗ , where we assume for the sake of a simple computation of
the variance that the Yt are uncorrelated. If the variables Yt have
common expectation µ, then this expectation carries over to Yt∗ . After
a change point N , where the expectation of Yt changes for t ≥ N from
µ to λ 6= µ, the filtered variables Yt∗ are, however, biased. This bias,
1.3 Autocovariances and Autocorrelations 35
which will vanish as t increases, is due to the still inherent influence

of past observations Yt , t < N . The influence of these variables on the
current expectation can be reduced by switching to a larger value of
α. The price for the gain in correctness of the expectation is, however,
a higher variability of Yt∗ (see Exercise 1.17).
An exponential smoother is often also used to make forecasts, explic-
itly by predicting Yt+1 through Yt∗ . The forecast error Yt+1 −Yt∗ =: et+1
∗
then satisfies the equation Yt+1 = αet+1 + Yt∗ .
Also a motivation of the exponential smoother via a least squares
approach is possible, see Exercise 1.18.
1.3 Autocovariances and Autocorrelations

Autocovariances and autocorrelations are measures of dependence be-
tween variables in a time series. Suppose that Y1 , . . . , Yn are square
integrable random variables with the property that the covariance
Cov(Yt+k , Yt ) = E((Yt+k − E(Yt+k ))(Yt − E(Yt ))) of observations with
lag k does not depend on t. Then
γ(k) := Cov(Yk+1 , Y1 ) = Cov(Yk+2 , Y2 ) = . . .
is called autocovariance function and
γ(k)
ρ(k) := , k = 0, 1, . . .
γ(0)
is called autocorrelation function.
Let y1 , . . . , yn be realizations of a time series Y1 , . . . , Yn . The empirical
counterpart of the autocovariance function is
n−k n
1X 1X
c(k) := (yt+k − ȳ)(yt − ȳ) with ȳ = yt
n t=1 n t=1
and the empirical autocorrelation is defined by

Pn−k
c(k) (yt+k − ȳ)(yt − ȳ)
r(k) := = t=1Pn 2
.
c(0) t=1 (y t − ȳ)
See Exercise 2.9 (ii) for the particular role of the factor 1/n in place
of 1/(n − k) in the definition of c(k). The graph of the function
r(k), k = 0, 1, . . . , n − 1, is called correlogram. It is based on the

assumption of equal expectations and should, therefore, be used for a
trend adjusted series. The following plot is the correlogram of the first
order differences of the Sunspot Data. The description can be found
on page 207. It shows high and decreasing correlations at regular
intervals.
Plot 1.3.1: Correlogram of the first order differences of the Sunspot

Data.
1 /* sunspot_correlogram */
2 TITLE1 ’Correlogram of first order differences’;
3 TITLE2 ’Sunspot Data’;
4
5 /* Read in the data, generate year of observation and
6 compute first order differences */
7 DATA data1;
8 INFILE ’c:\data\sunspot.txt’;
9 INPUT spot @@;
10 date=1748+_N_;
11 diff1=DIF(spot);
12
13 /* Compute autocorrelation function */
14 PROC ARIMA DATA=data1;
15 IDENTIFY VAR=diff1 NLAG=49 OUTCOV=corr NOPRINT;
16
18 AXIS1 LABEL=(’r(k)’);
19 AXIS2 LABEL=(’k’) ORDER=(0 12 24 36 48) MINOR=(N=11);
21
22 /* Plot autocorrelation function */
23 PROC GPLOT DATA=corr;
24 PLOT CORR*LAG / VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
25 RUN; QUIT;
In the data step, the raw data are read into OUTCOV=corr causes SAS to create a data set
the variable spot. The specification @@ sup- corr containing among others the variables
presses the automatic line feed of the INPUT LAG and CORR. These two are used in the fol-
statement after every entry in each row, see lowing GPLOT procedure to obtain a plot of the
also Program 1.2.1 (females.txt). The variable autocorrelation function. The ORDER option in
date and the first order differences of the vari- the AXIS2 statement specifies the values to
able of interest spot are calculated. appear on the horizontal axis as well as their
The following procedure ARIMA is a crucial order, and the MINOR option determines the
one in time series analysis. Here we just number of minor tick marks between two major
need the autocorrelation of delta, which will ticks. VREF=0 generates a horizontal reference
be calculated up to a lag of 49 (NLAG=49) line through the value 0 on the vertical axis.
by the IDENTIFY statement. The option
The autocovariance function γ obviously satisfies γ(0) ≥ 0 and, by

the Cauchy-Schwarz inequality
|γ(k)| = | E((Yt+k − E(Yt+k ))(Yt − E(Yt )))|

≤ E(|Yt+k − E(Yt+k )||Yt − E(Yt )|)
≤ Var(Yt+k )1/2 Var(Yt )1/2
= γ(0) for k ≥ 0.
Thus we obtain for the autocorrelation function the inequality
|ρ(k)| ≤ 1 = ρ(0).
Variance Stabilizing Transformation

The scatterplot of the points (t, yt ) sometimes shows a variation of
the data yt depending on their height.
Example 1.3.1. (Airline Data). Plot 1.3.2, which displays monthly
totals in thousands of international airline passengers from January
1949 to December 1960, exemplifies the above mentioned dependence.

These Airline Data are taken from Box et al. (1994); a discussion can
be found in Brockwell and Davis (1991, Section 9.2).
Plot 1.3.2: Monthly totals in thousands of international airline pas-

sengers from January 1949 to December 1960.
1 /* airline_plot.sas */
2 TITLE1 ’Monthly totals from January 49 to December 60’;
3 TITLE2 ’Airline Data’;
4
6 DATA data1;
7 INFILE ’c:\data\airline.txt’;
8 INPUT y;
9 t=_N_;
10
12 AXIS1 LABEL=NONE ORDER=(0 12 24 36 48 60 72 84 96 108 120 132 144)
,→MINOR=(N=5);
13 AXIS2 LABEL=(ANGLE=90 ’total in thousands’);
14 SYMBOL1 V=DOT C=GREEN I=JOIN H=0.2;
15
16 /* Plot the data */

18 PLOT y*t / HAXIS=AXIS1 VAXIS=AXIS2;
19 RUN; QUIT;
In the first data step, the monthly passenger The passenger totals are plotted against t with
totals are read into the variable y. To get a a line joining the data points, which are sym-
time variable t, the temporarily created SAS bolized by small dots. On the horizontal axis a
variable N is used; it counts the observations. label is suppressed.
The variation of the data yt obviously increases with their height. The
logtransformed data xt = log(yt ), displayed in the following figure,
however, show no dependence of variability from height.
Plot 1.3.3: Logarithm of Airline Data xt = log(yt ).
1 /* airline_log.sas */
2 TITLE1 ’Logarithmic transformation’;
4
5 /* Read in the data and compute log-transformed data */
6 DATA data1;
7 INFILE ’c\data\airline.txt’;
8 INPUT y;
9 t=_N_;
10 x=LOG(y);
11
13 AXIS1 LABEL=NONE ORDER=(0 12 24 36 48 60 72 84 96 108 120 132 144)
,→MINOR=(N=5);
16
17 /* Plot log-transformed data */
19 PLOT x*t / HAXIS=AXIS1 VAXIS=AXIS2;
20 RUN; QUIT;
The plot of the log-transformed data is done in ences are the log-transformation by means of
the same manner as for the original data in Pro- the LOG function and the suppressed label on
gram 1.3.2 (airline plot.sas). The only differ- the vertical axis.
The fact that taking the logarithm of data often reduces their vari-
ability, can be illustrated as follows. Suppose, for example, that
the data were generated by random variables, which are of the form
Yt = σt Zt , where σt > 0 is a scale factor depending on t, and Zt ,
t ∈ Z, are independent copies of a positive random variable Z with
variance 1. The variance of Yt is in this case σt2 , whereas the variance
of log(Yt ) = log(σt ) + log(Zt ) is a constant, namely the variance of
log(Z), if it exists.
A transformation of the data, which reduces the dependence of the
variability on their height, is called variance stabilizing. The loga-
rithm is a particular case of the general Box–Cox (1964) transforma-
tion Tλ of a time series (Yt ), where the parameter λ ≥ 0 is chosen by
the statistician:
( λ
(Yt − 1)/λ, Yt ≥ 0, λ > 0
Tλ (Yt ) :=
log(Yt ), Yt > 0, λ = 0.
Note that limλ&0 Tλ (Yt ) = T0 (Yt ) = log(Yt ) if Yt > 0 (Exercise 1.22).
Popular choices of the parameter λ are 0 and 1/2. A variance stabi-
lizing transformation of the data, if necessary, usually precedes any
further data manipulation such as trend or seasonal adjustment.
Exercises 41
Exercises
1.1. Plot the Mitscherlich function for different values of β1 , β2 , β3
using PROC GPLOT.
1.2. Put in the logistic trend model (1.5) zt := 1/yt ∼ 1/ E(Yt ) =
1/flog (t), t = 1, . . . , n. Then we have the linear regression model
zt = a + bzt−1 + εt , where εt is the error variable. Compute the least
squares estimates â, b̂ of a, b and motivate the estimates β̂1 := − log(b̂),
β̂3 := (1 − exp(−β̂1 ))/â as well as
n + 1 n
1 X β̂3
β̂2 := exp β̂1 + log −1 ,
2 n t=1 yt
proposed by Tintner (1958); see also Exercise 1.3.

1.3. The estimate β̂2 defined above suffers from the drawback that all
observations yt have to be strictly less than the estimate βˆ3 . Motivate
the following substitute of β̂2
n n
X β̂3 − yt . X
β̃2 = exp −β̂1 t exp −2β̂1 t
t=1
yt t=1
as an estimate of the parameter β2 in the logistic trend model (1.5).

1.4. Show that in a linear regression model yt = β1 xt +β2 , t = 1, . . . , n,
the squared multiple correlation coefficient R2 based on the least
squares estimates β̂1 , β̂2 and ŷt := β̂1 xt + β̂2 is necessarily between
zero and one with R2 = 1 if and only if ŷt = yt , t = 0, . . . , n (see
(1.12)).
1.5. (Population2 Data) Table 1.3.1 lists total population numbers of
North Rhine-Westphalia between 1961 and 1979. Suppose a logistic
trend for these data and compute the estimators β̂1 , β̂3 using PROC
REG. Since some observations exceed β̂3 , use β̃2 from Exercise 1.3 and
do an ex post-analysis. Use PROC NLIN and do an ex post-analysis.
Compare these two procedures by their residual sums of squares.
1.6. (Income Data) Suppose an allometric trend function for the in-
come data in Example 1.1.3 and do a regression analysis. Plot the
data yt versus β̂2 tβ̂1 . To this end compute the R2 -coefficient. Estimate
the parameters also with PROC NLIN and compare the results.
Year t Total Population

in millions
1961 1 15.920
1963 2 16.280
1965 3 16.661
1967 4 16.835
1969 5 17.044
1971 6 17.091
1973 7 17.223
1975 8 17.176
1977 9 17.052
1979 10 17.002
Table 1.3.1: Population2 Data.
1.7. (Unemployed2 Data) Table 1.3.2 lists total numbers of unem-

ployed (in thousands) in West Germany between 1950 and 1993. Com-
pare a logistic trend function with an allometric one. Which one gives
the better fit?
1.8. Give an update equation for a simple moving average of (even)
order 2s.
1.9. (Public Expenditures Data) Table 1.3.3 lists West Germany’s
public expenditures (in billion D-Marks) between 1961 and 1990.
Compute simple moving averages of order 3 and 5 to estimate a pos-
sible trend. Plot the original data as well as the filtered ones and
compare the curves.
1.10. Check that the Census X–11 Program leads to a moving average
of length 165, 169, or 179 of the original data, depending on the length
of the Henderson moving average in step (6) of X–11.
1.11. (Unemployed Females Data) Use PROC X11 to analyze the
monthly unemployed females between ages 16 and 19 in the United
States from January 1961 to December 1985 (in thousands).
1.12. Show that the rank of a matrix A equals the rank of AT A.
Exercises 43
Year Unemployed
1950 1869
1960 271
1970 149
1975 1074
1980 889
1985 2304
1988 2242
1989 2038
1990 1883
1991 1689
1992 1808
1993 2270
Table 1.3.2: Unemployed2 Data.
Year Public Expenditures Year Public Expenditures

1961 113,4 1976 546,2
1962 129,6 1977 582,7
1963 140,4 1978 620,8
1964 153,2 1979 669,8
1965 170,2 1980 722,4
1966 181,6 1981 766,2
1967 193,6 1982 796,0
1968 211,1 1983 816,4
1969 233,3 1984 849,0
1970 264,1 1985 875,5
1971 304,3 1986 912,3
1972 341,0 1987 949,6
1973 386,5 1988 991,1
1974 444,8 1989 1018,9
1975 509,1 1990 1118,1
Table 1.3.3: Public Expenditures Data.

1.13. The p + 1 columns of the design matrix X in (1.17) are linear

independent.
1.14. Let (cu ) be the moving average derived by the best local poly-
nomial fit. Show that
(i) fitting locally a polynomial of degree 2 to five consecutive data
points leads to
1
(cu ) = (−3, 12, 17, 12, −3)T ,
35
(ii) the inverse matrix A−1 of an invertible m × m-matrix A =

(aij )1≤i,j≤m with the property that aij = 0, if i + j is odd, shares
this property,
(iii) (cu ) is symmetric, i.e., c−u = cu .
1.15. (Unemployed1 Data) Compute a seasonal and trend adjusted
time series for the Unemployed1 Data in the building trade. To this
end compute seasonal differences and first order differences. Compare
the results with those of PROC X11.
1.16. Use the SAS function RANNOR to generate a time series Yt =
b0 +b1 t+εt , t = 1, . . . , 100, where b0 , b1 6= 0 and the εt are independent
normal random variables with mean µ and variance σ12 if t ≤ 69 but
variance σ22 6= σ12 if t ≥ 70. Plot the exponentially filtered variables
Yt∗ for different values of the smoothing parameter α ∈ (0, 1) and
compare the results.
1.17. Compute under the assumptions of Corollary 1.2.9 the variance
of an exponentially filtered variable Yt∗ after a change point t = N
with σ 2 := E(Yt − µ)2 for t < N and τ 2 := E(Yt − λ)2 for t ≥ N . What
is the limit for t → ∞?
1.18. Show that one obtains the exponential smoother also as least
squares estimator of the weighted approach
∞
X
(1 − α)j (Yt−j − µ)2 = min
µ
j=0
with Yt = Y0 for t < 0.

Exercises 45
1.19. (Female Unemployed Data) Compute exponentially smoothed

series of the Female Unemployed Data with different smoothing pa-
rameters α. Compare the results to those obtained with simple mov-
ing averages and X–11.
1.20. (Bankruptcy Data) Table 1.3.4 lists the percentages to annual

bancruptcies among all US companies between 1867 and 1932:
1.33 0.94 0.79 0.83 0.61 0.77 0.93 0.97 1.20 1.33
1.36 1.55 0.95 0.59 0.61 0.83 1.06 1.21 1.16 1.01
0.97 1.02 1.04 0.98 1.07 0.88 1.28 1.25 1.09 1.31
1.26 1.10 0.81 0.92 0.90 0.93 0.94 0.92 0.85 0.77
0.83 1.08 0.87 0.84 0.88 0.99 0.99 1.10 1.32 1.00
0.80 0.58 0.38 0.49 1.02 1.19 0.94 1.01 1.00 1.01
1.07 1.08 1.04 1.21 1.33 1.53
Table 1.3.4: Bankruptcy Data.
Compute and plot the empirical autocovariance function and the

empirical autocorrelation function using the SAS procedures PROC
ARIMA and PROC GPLOT.
1.21. Verify that the empirical correlation r(k) at lag k for the trend
yt = t, t = 1, . . . , n is given by
k k(k 2 − 1)
r(k) = 1 − 3 + 2 , k = 0, . . . , n.
n n(n2 − 1)
Plot the correlogram for different values of n. This example shows,
that the correlogram has no interpretation for non-stationary pro-
cesses (see Exercise 1.20).
1.22. Show that
lim Tλ (Yt ) = T0 (Yt ) = log(Yt ), Yt > 0

λ↓0
for the Box–Cox transformation Tλ .

Chapter
Models of Time Series
Each time series Y1 , . . . , Yn can be viewed as a clipping from a sequence
of random variables . . . , Y−2 , Y−1 , Y0 , Y1 , Y2 , . . . In the following we will
introduce several models for such a stochastic process Yt with index
2
set Z.
2.1 Linear Filters and Stochastic

Processes
For mathematical convenience we will consider complex valued ran-
dom variables Y , whose range is √ the set of complex numbers C =
{u + iv : u, v ∈ R}, where i = −1. Therefore, we can decompose
Y as Y = Y(1) + iY(2) , where Y(1) = Re(Y ) is the real part of Y and
Y(2) = Im(Y ) is its imaginary part. The random variable Y is called
integrable if the real valued random variables Y(1) , Y(2) both have finite
expectations, and in this case we define the expectation of Y by
E(Y ) := E(Y(1) ) + i E(Y(2) ) ∈ C.
This expectation has, up to monotonicity, the usual properties such as

E(aY +bZ) = a E(Y )+b E(Z) of its real counterpart (see Exercise 2.1).
Here a and b are complex numbers and Z is a further integrable com-
plex valued random variable. In addition we have E(Y ) = E(Ȳ ),
where ā = u − iv denotes the conjugate complex number of a = u + iv.
Since |a|2 := u2 + v 2 = aā = āa, we define the variance of Y by
Var(Y ) := E((Y − E(Y ))(Y − E(Y ))) ≥ 0.
The complex random variable Y is called square integrable if this

number is finite. To carry the equation Var(X) = Cov(X, X) for a
48 Models of Time Series
real random variable X over to complex ones, we define the covariance

of complex square integrable random variables Y, Z by
Cov(Y, Z) := E((Y − E(Y ))(Z − E(Z))).
Note that the covariance Cov(Y, Z) is no longer symmetric with re-
spect to Y and Z, as it is for real valued random variables, but it
satisfies Cov(Y, Z) = Cov(Z, Y ).
The following lemma implies that the Cauchy–Schwarz inequality car-
ries over to complex valued random variables.
Lemma 2.1.1. For any integrable complex valued random variable
Y = Y(1) + iY(2) we have
| E(Y )| ≤ E(|Y |) ≤ E(|Y(1) |) + E(|Y(2) |).
Proof. We write E(Y ) in polar coordinates E(Y ) = reiϑ , where r =
| E(Y )| and ϑ ∈ [0, 2π). Observe that

−iϑ
Re(e Y ) = Re (cos(ϑ) − i sin(ϑ))(Y(1) + iY(2) )
= cos(ϑ)Y(1) + sin(ϑ)Y(2)
≤ (cos2 (ϑ) + sin2 (ϑ))1/2 (Y(1)
2 2 1/2
+ Y(2) ) = |Y |
by the Cauchy–Schwarz inequality for real numbers. Thus we obtain
| E(Y )| = r = E(e−iϑ Y )

−iϑ
= E Re(e Y ) ≤ E(|Y |).
2 2 1/2
The second inequality of the lemma follows from |Y | = (Y(1) +Y(2) ) ≤
|Y(1) | + |Y(2) |.
The next result is a consequence of the preceding lemma and the
Cauchy–Schwarz inequality for real valued random variables.
Corollary 2.1.2. For any square integrable complex valued random
variable we have
| E(Y Z)| ≤ E(|Y ||Z|) ≤ E(|Y |2 )1/2 E(|Z|2 )1/2
and thus,
| Cov(Y, Z)| ≤ Var(Y )1/2 Var(Z)1/2 .
2.1 Linear Filters and Stochastic Processes 49
Stationary Processes
A stochastic process (Yt )t∈Z of square integrable complex valued ran-
dom variables is said to be (weakly) stationary if for any t1 , t2 , k ∈ Z
E(Yt1 ) = E(Yt1 +k ) and E(Yt1 Y t2 ) = E(Yt1 +k Y t2 +k ).
The random variables of a stationary process (Yt )t∈Z have identical
means and variances. The autocovariance function satisfies moreover
for s, t ∈ Z
γ(t, s) : = Cov(Yt , Ys ) = Cov(Yt−s , Y0 ) =: γ(t − s)
= Cov(Y0 , Yt−s ) = Cov(Ys−t , Y0 ) = γ(s − t),
and thus, the autocovariance function of a stationary process can be
viewed as a function of a single argument satisfying γ(t) = γ(−t), t ∈
Z.
A stationary process (εt )t∈Z of square integrable and uncorrelated real
valued random variables is called white noise i.e., Cov(εt , εs ) = 0 for
t 6= s and there exist µ ∈ R, σ ≥ 0 such that
E(εt ) = µ, E((εt − µ)2 ) = σ 2 , t ∈ Z.
In Section 1.2 we defined linear filters of a time series, which were
based on a finite number of real valued weights. In the following
we consider linear filters with an infinite number of complex valued
weights.
Suppose that (εt )t∈Z is a white
P∞noise and letP (at )t∈Z be
P a sequence of
complex numbers satisfying t=−∞ |at | := t≥0 |at |+ t≥1 |a−t | < ∞.
Then (at )t∈Z is said to be an absolutely summable (linear) filter and
∞
X X X
Yt := au εt−u := au εt−u + a−u εt+u , t ∈ Z,
u=−∞ u≥0 u≥1
is called a general linear process.
Existence of General Linear Processes

P∞
We will show that P∞|au εt−u | < ∞ with probability one for
u=−∞
t ∈ Z and, thus, Yt = u=−∞ au εt−u is well defined. Denote by
L2 := L2 (Ω, A, P) the set of all complex valued square integrable ran-

dom variables, defined on some probability space (Ω, A, P), and put
||Y ||2 := E(|Y |2 )1/2 , which is the L2 -pseudonorm on L2 .
Lemma 2.1.3. Let Xn , n ∈ N, be a sequence in L2 such that ||Xn+1 −
Xn ||2 ≤ 2−n for each n ∈ N. Then there exists X ∈ L2 such that
limn→∞ Xn = X with probability one.
P
Proof. Write Xn = k≤n (Xk − Xk−1 ), where X0 := 0. By the mono-
tone convergence theorem, the Cauchy–Schwarz inequality and Corol-
lary 2.1.2 we have
X X X
E |Xk − Xk−1 | = E(|Xk − Xk−1 |) ≤ ||Xk − Xk−1 ||2
k≥1 k≥1 k≥1
X
≤ ||X1 ||2 + 2−k < ∞.
k≥1
P
This implies that k≥1 |XPk − Xk−1 | < ∞ with probability one and
hence, the limit limn→∞ k≤n (Xk − Xk−1 ) = limn→∞ Xn = X exists
in C almost surely. Finally, we check that X ∈ L2 :
E(|X|2 ) = E( lim |Xn |2 )

n→∞
X 2
≤ E lim |Xk − Xk−1 |
n→∞
k≤n
X 2
= lim E |Xk − Xk−1 |
n→∞
k≤n
X
= lim E(|Xk − Xk−1 | |Xj − Xj−1 |)
n→∞
k,j≤n
X
≤ lim ||Xk − Xk−1 ||2 ||Xj − Xj−1 ||2
n→∞
k,j≤n
X 2
= lim ||Xk − Xk−1 ||2
n→∞
k≤n
X 2
= ||Xk − Xk−1 ||2 < ∞.
k≥1
Theorem 2.1.4. The space (L2 , || · ||2 ) is complete i.e., suppose that
Xn ∈ L2 , n ∈ N, has the property that for arbitrary ε > 0 one can find
an integer N (ε) ∈ N such that ||Xn −Xm ||2 < ε if n, m ≥ N (ε). Then
there exists a random variable X ∈ L2 such that limn→∞ ||X −Xn ||2 =
0.
Proof. We can find integers n1 < n2 < . . . such that
||Xn − Xm ||2 ≤ 2−k if n, m ≥ nk .
By Lemma 2.1.3 there exists a random variable X ∈ L2 such that

limk→∞ Xnk = X with probability one. Fatou’s lemma implies
||Xn − X||22 = E(|Xn − X|2 )

= E lim inf |Xn − Xnk |2 ≤ lim inf ||Xn − Xnk ||22 .

k→∞ k→∞
The right-hand side of this inequality becomes arbitrarily small if we

choose n large enough, and thus we have limn→∞ ||Xn − X||22 = 0.
The following result implies in particular that a general linear process
is well defined.
Theorem 2.1.5. Suppose that (Zt )t∈Z is a complex valued stochastic

process such that supt E(|Zt |) < P
∞ and let (at )t∈Z be an absolutely
summable filter. Then we have P u∈Z |au Zt−u | < ∞ with probability
one for t ∈ Z and, thus, Yt := u∈Z au Zt−u exists almost surely in C.
We have moreover E(|Yt |) < ∞, t ∈ Z, and
(i) E(Yt ) = limn→∞ nu=−n au E(Zt−u ), t ∈ Z,

P
(ii) E(|Yt − nu=−n au Zt−u |) −→n→∞ 0.

P
If, in addition, supt E(|Zt |2 ) < ∞, then we have E(|Yt |2 ) < ∞, t ∈ Z,

and
(iii) ||Yt − nu=−n au Zt−u ||2 −→n→∞ 0.

P
Proof. The monotone convergence theorem implies

X n
X
E |au | |Zt−u | = lim E |au ||Zt−u |
n→∞
u∈Z u=−n
n
X
= lim |au | E(|Zt−u |)
n→∞
u=−n
X n
≤ lim |au | sup E(|Zt−u |) < ∞
n→∞ t∈Z
u=−n
P
and, thus, we have Pu∈Z |au ||Zt−u | < ∞ with probability one as
Pn as E(|Yt |) ≤ E( u∈Z |au ||Zt−u |) < ∞, t ∈ Z. Put Xn (t) :=
well
u=−n au Zt−u . Then we have |Yt P
− Xn (t)| −→n→∞ 0 almost surely.
By the inequality |Yt − Xn (t)| ≤ u∈Z |au ||Zt−u |, n ∈ N, the domi-
nated convergence theorem implies (ii) and therefore (i):
n
X
| E(Yt ) − au E(Zt−u )| = | E(Yt ) − E(Xn (t))|
u=−n
≤ E(|Yt − Xn (t)|) −→n→∞ 0.
Put K := supt E(|Zt |2 ) < ∞. The Cauchy–Schwarz inequality implies

for m, n ∈ N and ε > 0
0 ≤ E(|Xn+m (t) − Xn (t)|2 )

 2 
n+m
 X 
= E  au Zt−u 
|u|=n+1
n+m
X n+m
X
= au āw E(Zt−u Z̄t−w )
|u|=n+1 |w|=n+1
n+m
X n+m
X
≤ |au ||aw | E(|Zt−u ||Zt−w |)
|u|=n+1 |w|=n+1
n+m
X n+m
X
≤ |au ||aw | E(|Zt−u |2 )1/2 E(|Zt−w |2 )1/2
|u|=n+1 |w|=n+1
 2  2
n+m
X X
≤K |au | ≤ K  |au | < ε
|u|=n+1 |u|≥n
if n is chosen sufficiently large. Theorem 2.1.4 now implies the exis-

tence of a random variable X(t) ∈ L2 with limn→∞ ||Xn (t) − X(t)||2 =
0. For the proof of (iii) it remains to show that X(t) = Yt almost
surely. Markov’s inequality implies
P {|Yt − Xn (t)| ≥ ε} ≤ ε−1 E(|Yt − Xn (t)|) −→n→∞ 0
by (ii), and Chebyshev’s inequality yields
P {|X(t) − Xn (t)| ≥ ε} ≤ ε−2 ||X(t) − Xn (t)||2 −→n→∞ 0
for arbitrary ε > 0. This implies
P {|Yt − X(t)| ≥ ε}
≤ P {|Yt − Xn (t)| + |Xn (t) − X(t)| ≥ ε}
≤ P {|Yt − Xn (t)| ≥ ε/2} + P {|X(t) − Xn (t)| ≥ ε/2} −→n→∞ 0
and thus Yt = X(t) almost surely, because Yt does not depend on n.

This completes the proof of Theorem 2.1.5.
Theorem 2.1.6. Suppose that (Zt )t∈Z is a stationary process with

mean µZ := E(Z0 ) and autocovariance function
P γZ and let (at ) be
an absolutely summable filter. Then Yt = u au Zt−u , t ∈ Z, is also
stationary with X
µY = E(Y0 ) = au µ Z
u
and autocovariance function
XX
γY (t) = au āw γZ (t + w − u).
u w
Proof. Note that
E(|Zt |2 ) = E |Zt − µZ + µZ |2

= E (Zt − µZ + µZ )(Zt − µz + µz )
= E |Zt − µZ |2 + |µZ |2

= γZ (0) + |µZ |2
and, thus,
sup E(|Zt |2 ) < ∞.
t∈Z
We can, therefore, now apply Theorem

P 2.1.5. Part (i) of Theorem 2.1.5
immediately implies E(Yt ) = ( u au )µZ and part (iii) implies that the
Yt are square integrable and for t, s ∈ Z we get (see Exercise 2.16 for
the second equality)
γY (t − s) = E((Yt − µY )(Ys − µY ))
X n n
X
= lim Cov au Zt−u , aw Zs−w
n→∞
u=−n w=−n
n
X Xn
= lim au āw Cov(Zt−u , Zs−w )
n→∞
u=−n w=−n
Xn Xn
= lim au āw γZ (t − s + w − u)
n→∞
u=−n w=−n
XX
= au āw γZ (t − s + w − u).
u w
The covariance of Yt and Ys depends, therefore,P only

Pon the difference
t−s. Note that |γP
Z (t)| ≤ γZ (0) < ∞ and thus, u w |au āw γZ (t−s+
2
w − u)| ≤ γZ (0)( u |au |) < ∞, i.e., (Yt ) is a stationary process.
The Covariance Generating Function

The covariance generating function of a stationary process with au-
tocovariance function γ is defined as the double series
X X X
G(z) := γ(t)z t = γ(t)z t + γ(−t)z −t ,
t∈Z t≥0 t≥1
known as a Laurent series in complex analysis. We assume that there

exists a real number r > 1 such that G(z) is defined for all z ∈ C in
the annulus 1/r < |z| < r. The covariance generating function will
help us to compute the autocovariances of filtered processes.
Since the coefficients of a Laurent series are uniquely determined (see
e.g. Conway, 1978, Chapter V, 1.11), the covariance generating func-
tion of a stationary process is a constant function if and only if this
process is a white noise i.e., γ(t) = 0 for t 6= 0.
P
Theorem 2.1.7. Suppose that Yt = u au εt−u , t ∈ Z, is a general
linear process with u |au ||z | < ∞, if r−1 < |z| < r for some r > 1.
u
P
Put σ 2 := Var(ε0 ). The process (Yt ) then has the covariance generat-
ing function
X X
−u
G(z) = σ 2
au z u
āu z , r−1 < |z| < r.
u u
Proof. Theorem 2.1.6 implies for t ∈ Z

XX
Cov(Yt , Y0 ) = au āw γε (t + w − u)
u w
X
2
=σ au āu−t .
u
This implies
XX
G(z) = σ 2 au āu−t z t
tX u XX XX
2 2 t t
=σ |au | + au āu−t z + au āu−t z
u t≥1 u t≤−1 u
X X X X X
2 2 u−t u−t
=σ |au | + au āt z + au āt z
u u t≤u−1 u t≥u+1
XX X X
2 u−t 2 u −t
=σ au āt z =σ au z āt z .
u t u t
Example 2.1.8. Let (εt )t∈Z be a white noise with Var(ε0 ) =: σ 2 >
0. The
Pcovariance generating function of the simple moving average
Yt = u au εt−u with a−1 = a0 = a1 = 1/3 and au = 0 elsewhere is
then given by
σ 2 −1
G(z) = (z + z 0 + z 1 )(z 1 + z 0 + z −1 )
9
σ2
= (z −2 + 2z −1 + 3z 0 + 2z 1 + z 2 ), z ∈ R.
9
Then the autocovariances are just the coefficients in the above series
σ2
γ(0) = ,
3
2σ 2
γ(1) = γ(−1) = ,
9
σ2
γ(2) = γ(−2) = ,
9
γ(k) = 0 elsewhere.
This explains the name covariance generating function.
The Characteristic Polynomial

Let (au ) be an absolutely summable filter. The Laurent series
X
A(z) := au z u
u∈Z
is called characteristic polynomial of (au ). We know from complex

analysis that A(z) exists either for all z in some annulus r < |z| < R
or almost nowhere. In the first case the coefficients au are uniquely
determined by the function A(z) (see e.g. Conway, 1978, Chapter V,
1.11).
If, for example, (au ) is absolutely summable with au = 0 for u ≥ 1,
then A(z) exists for all complex z such that |z| ≥ 1. If au = 0 for all
large |u|, then A(z) exists for all z 6= 0.
Inverse Filters
Let now
P (au ) and (bu ) be absolutely summable filters and denote by
Yt := u au Zt−u , the filtered stationary sequence, where (Zu )u∈Z is a
stationary process. Filtering (Yt )t∈Z by means of (bu ) leads to
X XX X X
bw Yt−w = bw au Zt−w−u = ( bw au )Zt−v ,
w w u v u+w=v
P
where cv := v ∈ Z, is an absolutely summable filter:
u+w=v bw au ,
X X X X X
|cv | ≤ |bw au | = ( |au |)( |bw |) < ∞.
v v u+w=v u w
We call (cv ) the product filter of (au ) and (bu ).
Lemma 2.1.9. Let (au ) and (bu ) be absolutely summable filters with
characteristic polynomials A1 (z) and A2 (z), whichP
both exist on some
annulus r < |z| < R. The product filter (cv ) = ( u+w=v bw au ) then
has the characteristic polynomial
A(z) = A1 (z)A2 (z).
Proof. By repeating the above arguments we obtain

X X
A(z) = bw au z v = A1 (z)A2 (z).
v u+w=v
Suppose now that (au ) and (bu ) are absolutely summable filters with
characteristic polynomials A1 (z) and A2 (z), which both exist on some
annulus r < z < R, where they satisfy A1 (z)A2 (z) = 1. Since 1 =
v
P
v cv z if c0 = 1 and cv = 0 elsewhere, the uniquely determined
coefficients of the characteristic polynomial of the product filter of
(au ) and (bu ) are given by
(
X 1 if v = 0
bw au =
u+w=v
0 if v 6= 0.
In this case we obtain for a stationary process (Zt ) that almost surely
X X
Yt = au Zt−u and bw Yt−w = Zt , t ∈ Z. (2.1)
u w
The filter (bu ) is, therefore, called the inverse filter of (au ).
Causal Filters
An absolutely summable filter (au )u∈Z is called causal if au = 0 for
u < 0.
Lemma 2.1.10. Let a ∈ C. The filter (au ) with a0 = 1, a1 = −a and
au = 0 elsewhere has an absolutely summable and causal inverse filter
(bu )u≥0 if and only if |a| < 1. In this case we have bu = au , u ≥ 0.
Proof. The characteristic polynomial of (au ) is A1 (z) = 1−az, z ∈ C.
Since the characteristic polynomial A2 (z) of an inverse filter satisfies
A1 (z)A2 (z) = 1 on some annulus, we have A2 (z) = 1/(1−az). Observe
now that
1 X
= au z u , if |z| < 1/|a|.
1 − az u≥0
As a consequence, if |a| < 1, then A2 (z) = u≥0 au z u exists for all

P
|z| <P1 and the inverse causal filter (au )u≥0 isPabsolutely summable,
i.e., u≥0 |au | <P∞. If |a| ≥ 1, then A2 (z) = u≥0 au z u exists for all
|z| < 1/|a|, but u≥0 |a|u = ∞, which completes the proof.
Theorem 2.1.11. Let a1 , a2 , . . . , ap ∈ C, ap 6= 0. The filter (au ) with
coefficients a0 = 1, a1 , . . . , ap and au = 0 elsewhere has an absolutely
summable and causal inverse filter if the p roots z1 , . . . , zp ∈ C of

A(z) = 1 + a1 z + a2 z 2 + · · · + ap z p are outside of the unit circle i.e.,
|zi | > 1 for 1 ≤ i ≤ p.
Proof. We know from the Fundamental Theorem of Algebra that the

equation A(z) = 0 has exactly p roots z1 , . . . , zp ∈ C (see e.g. Conway,
1978, Chapter IV, 3.5), which are all different from zero, since A(0) =
1. Hence we can write (see e.g. Conway, 1978, Chapter IV, 3.6)
A(z) = ap (z − z1 ) · · · (z − zp )
z z z
=c 1− 1− ··· 1 − ,
z1 z2 zp
where c := ap (−1)p z1 · · · zp . In case of |zi | > 1 we can write for

|z| < |zi |
1 X 1 u
z = zu, (2.2)
1 − zi u≥0
zi
where the coefficients (1/zi )u , u ≥ 0, are absolutely summable. In

case of |zi | < 1, we have for |z| > |zi |
1 1 1 zi X u −u X 1 u
z =−z zi =− zi z = − zu,
1− zi zi 1− z z u≥0 u≤−1
zi
where the filter with coefficients −(1/zi )u , u ≤ −1, is not a causal

one. In case of |zi | = 1, we have for |z| < 1
1 X 1 u
z = zu,
1− zi u≥0
zi
where the coefficients (1/zi )u , u ≥ 0, are not absolutely summable.

Since the coefficients of a Laurent series are uniquely
P determined, the
u
factor 1 − z/zP i has an inverse 1/(1 − z/zi ) = u≥0 bu z on some
annulus with u≥0 |bu | < ∞ if |zi | > 1. A small analysis implies that
this argument carries over to the product
1 1
=
A(z) c 1 − z
... 1 − z
z1 zp
u
P
which
P has an expansion 1/A(z) = u≥0 bu z on some annulus with
u≥0 |bu | < ∞ if each factor has such an expansion, and thus, the
proof is complete.
Remark 2.1.12. Note that the roots z1 , . . . , zp of A(z) = 1 + a1 z +

· · ·+ap z p are complex valued and thus, the coefficients bu of the inverse
causal filter will, in general, be complex valued as well. The preceding
proof shows, however, that if ap and each zi are real numbers, then
the coefficients bu , u ≥ 0, are real as well.
The preceding proof shows, moreover, that a filter (au ) with complex
coefficients a0 , a1 , . . . , ap ∈ C and au = 0 elsewhere has an absolutely
summable inverse filter if no root z ∈ C of the equation A(z) =
a0 + a1 z + · · · + ap z p = 0 has length 1 i.e., |z| 6= 1 for each root. The
additional condition |z| > 1 for each root then implies that the inverse
filter is a causal one.
Example 2.1.13. The filter with coefficients a0 = 1, a1 = −0.7 and

a2 = 0.1 has the characteristic polynomial A(z) = 1 − 0.7z + 0.1z 2 =
0.1(z − 2)(z − 5), with z1 = 2, z2 = 5 being the roots of A(z) =
0. Theorem 2.1.11 implies the existence of an absolutely summable
inverse causal filter, whose coefficients can be obtained by expanding
1/A(z) as a power series of z:
! !
1 1 X 1 u X 1 w
= = zu zw
A(z) 1 − z2 1 − z5 u≥0
2 w≥0
5
X X 1 u 1 w
= zv
v≥0 u+w=v
2 5
v
XX 1 v−w 1 w v
= z
v≥0 w=0
2 5
v+1
X 1 v 1 − 52
!
X 10 1 v+1 1 v+1
= 2 zv = − zv .
v≥0
2 1− 5 v≥0
3 2 5
The preceding expansion implies that bv := (10/3)(2−(v+1) − 5−(v+1) ),

v ≥ 0, are the coefficients of the inverse causal filter.
2.2 Moving Averages and Autoregressive Processes 61
2.2 Moving Averages and Autoregressive

Processes
Let a1 , . . . , aq ∈ R with aq 6= 0 and let (εt )t∈Z be a white noise. The
process
Yt := εt + a1 εt−1 + · · · + aq εt−q
is said to be a moving average of order q, denoted by MA(q). Put
a 0 = 1. Theorem 2.1.6 and 2.1.7 imply that a moving average Yt =
P q
u=0 au εt−u is a stationary process with covariance generating func-
tion
q
X q
X
2 u −w
G(z) = σ au z aw z
u=0 w=0
q
XX q
= σ2 au aw z u−w
u=0 w=0
Xq X
= σ2 au aw z u−w
v=−q u−w=v
X q X q−v
2
=σ av+w aw z v , z ∈ C,
v=−q w=0
where σ 2 = Var(ε0 ). The coefficients of this expansion provide the

autocovariance function γ(v) = Cov(Y0 , Yv ), v ∈ Z, which cuts off
after lag q.
Lemma 2.2.1. Suppose that Yt = qu=0 au εt−u , t ∈ Z, is a MA(q)-

P
process. Put µ := E(ε0 ) and σ 2 := Var(ε0 ). Then we have
(i) E(Yt ) = µ qu=0 au ,

P

0,
 v > q,
(ii) γ(v) = Cov(Yv , Y0 ) = 2
q−v
P
σ
 av+w aw , 0 ≤ v ≤ q,
w=0
γ(−v) = γ(v),
Pq
(iii) Var(Y0 ) = γ(0) = σ 2 2
w=0 aw ,


0, v > q,

γ(v)  q−v . P
q 2

(iv) ρ(v) = =
P
γ(0) 
av+w aw w=0 aw , 0 < v ≤ q,

 w=0
1, v = 0,

ρ(−v) = ρ(v).
Example 2.2.2. The MA(1)-process Yt = εt + aεt−1 with a 6= 0 has
the autocorrelation function

1,
 v=0
2
ρ(v) = a/(1 + a ), v = ±1

0 elsewhere.
Since a/(1 + a2 ) = (1/a)/(1 + (1/a)2 ), the autocorrelation functions

of the two MA(1)-processes with parameters a and 1/a coincide. We
have, moreover, |ρ(1)| ≤ 1/2 for an arbitrary MA(1)-process and thus,
a large value of the empirical autocorrelation function r(1), which
exceeds 1/2 essentially, might indicate that an MA(1)-model for a
given data set is not a correct assumption.
Invertible Processes
Example 2.2.2 shows that a MA(q)-process is not uniquely determined
by its autocorrelation function. In order to get a unique relationship
between moving average processes and their autocorrelation function,
Box and Jenkins introduced the condition of invertibility. This is
useful for estimation procedures, since the coefficients of an MA(q)-
process will be estimated later by the empirical autocorrelation func-
tion, see Section 2.3. Pq
The MA(q)-process Yt = u=0 au εt−u , with a0 = 1 and a q 6= 0, is
said to be invertible if all q roots z1 , . . . , zq ∈ C of A(z) = qu=0 au z u
P
are outside of the unit circle i.e., if |zi | > 1 for 1 ≤ i ≤ q. Theorem
2.1.11 and representation (2.1) imply that the white Pqnoise process (εt ),
pertaining to an invertible MA(q)-process Yt = u=0 au εt−u , can be
obtained by means of an absolutely summable and causal filter (bu )u≥0
via X
εt = bu Yt−u , t ∈ Z,
u≥0
with probability one. In particular the MA(1)-process Yt = εt − aεt−1

is invertible iff |a| < 1, and in this case we have by Lemma 2.1.10 with
probability one X
εt = au Yt−u , t ∈ Z.
u≥0
Autoregressive Processes
A real valued stochastic process (Yt ) is said to be an autoregressive
process of order p, denoted by AR(p) if there exist a1 , . . . , ap ∈ R with
ap 6= 0, and a white noise (εt ) such that
Yt = a1 Yt−1 + · · · + ap Yt−p + εt , t ∈ Z. (2.3)
The value of an AR(p)-process at time t is, therefore, regressed on its
own past p values plus a random shock.
The Stationarity Condition

While by Theorem 2.1.6 MA(q)-processes are automatically station-
ary, this is not true for AR(p)-processes (see Exercise 2.28). The fol-
lowing result provides a sufficient condition on the constants a1 , . . . , ap
implying the existence of a uniquely determined stationary solution
(Yt ) of (2.3).
Theorem 2.2.3. The AR(p)-equation (2.3) with the given constants
a1 , . . . , ap and white noise (εt )t∈Z has a stationary solution (Yt )t∈Z if
all p roots of the equation 1 − a1 z − a2 z 2 − · · · − ap z p = 0 are outside
of the unit circle. In this case, the stationary solution is almost surely
uniquely determined by
X
Yt := bu εt−u , t ∈ Z,
u≥0
where (bu )u≥0 is the absolutely summable inverse causal filter of c0 =

1, cu = −au , u = 1, . . . , p and cu = 0 elsewhere.
Proof. The existence of an absolutely summablePcausal filter follows
from Theorem 2.1.11. The stationarity of Yt = u≥0 bu εt−u is a con-
sequence of Theorem 2.1.6, and its uniqueness follows from
εt = Yt − a1 Yt−1 − · · · − ap Yt−p , t ∈ Z,
and equation (2.1) on page 58.

The conditionPthat all roots of the characteristic equation of an AR(p)-
process Yt = pu=1 au Yt−u + εt are outside of the unit circle i.e.,
1 − a1 z − a2 z 2 − · · · − ap z p 6= 0 for |z| ≤ 1, (2.4)
will be referred to in the following as the stationarity condition for an

AR(p)-process.
An AR(p) process satisfying the stationarity condition can be inter-
preted as a MA(∞) process.
Note that a stationary solution (Yt ) of (2.1) exists in general if no root
zi of the characteristic equation lies on the unit sphere. If there are
solutions in the unit circle, then the stationary solution is noncausal,
i.e., Yt is correlated with future values of εs , s > t. This is frequently
regarded as unnatural.
Example 2.2.4. The AR(1)-process Yt = aYt−1 + εt , t ∈ Z, with

a 6= 0 has the characteristic equation 1 − az = 0 with the obvious
solution z1 = 1/a. The process (Yt ), therefore, satisfies the stationar-
ity condition iff |z1 | > 1 i.e., iff |a| < 1. In this case we obtain from
Lemma 2.1.10 that the absolutely summable inverse causal filter of
a0 = 1, a1 = −a and au = 0 elsewhere is given by bu = au , u ≥ 0,
and thus, with probability one
X X
Yt = bu εt−u = au εt−u .
u≥0 u≥0
Denote by σ 2 the variance of ε0 . From Theorem 2.1.6 we obtain the

autocovariance function of (Yt )
XX
γ(s) = bu bw Cov(ε0 , εs+w−u )
u w
X
= bu bu−s Cov(ε0 , ε0 )
u≥0
X as
= σ 2 as a2(u−s) = σ 2 , s = 0, 1, 2, . . .
u≥s
1 − a2
and γ(−s) = γ(s). In particular we obtain γ(0) = σ 2 /(1 − a2 ) and

thus, the autocorrelation function of (Yt ) is given by
ρ(s) = a|s| , s ∈ Z.
The autocorrelation function of an AR(1)-process Yt = aYt−1 + εt

with |a| < 1 therefore decreases at an exponential rate. Its sign is
alternating if a ∈ (−1, 0).
Plot 2.2.1: Autocorrelation functions of AR(1)-processes Yt = aYt−1 +

εt with different values of a.
1 /* ar1_autocorrelation.sas */
2 TITLE1 ’Autocorrelation functions of AR(1)-processes’;
3
4 /* Generate data for different autocorrelation functions */
5 DATA data1;
6 DO a=-0.7, 0.5, 0.9;
7 DO s=0 TO 20;
8 rho=a**s;
9 OUTPUT;
10 END;
11 END;
12
14 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.3 L=1;
17 AXIS1 LABEL=(’s’);
18 AXIS2 LABEL=(F=CGREEK ’r’ F=COMPLEX H=1 ’a’ H=2 ’(s)’);
19 LEGEND1 LABEL=(’a=’) SHAPE=SYMBOL(10,0.6);
20
21 /* Plot autocorrelation functions */
23 PLOT rho*s=a / HAXIS=AXIS1 VAXIS=AXIS2 LEGEND=LEGEND1 VREF=0;
24 RUN; QUIT;
The data step evaluates rho for three differ- assuming this to be the default text font
ent values of a and the range of s from 0 (GOPTION FTEXT=COMPLEX). The SHAPE op-
to 20 using two loops. The plot is gener- tion SHAPE=SYMBOL(10,0.6) in the LEGEND
ated by the procedure GPLOT. The LABEL op- statement defines width and height of the sym-
tion in the AXIS2 statement uses, in addition bols presented in the legend.
to the greek font CGREEK, the font COMPLEX
The following figure illustrates the significance of the stationarity con-

dition |a| < 1 of an AR(1)-process. Realizations Yt = aYt−1 + εt , t =
1, . . . , 10, are displayed for a = 0.5 and a = 1.5, where ε1 , ε2 , . . . , ε10
are independent standard normal in each case and Y0 is assumed to
be zero. While for a = 0.5 the sample path follows the constant
zero closely, which is the expectation of each Yt , the observations Yt
decrease rapidly in case of a = 1.5.
Plot 2.2.2: Realizations of the AR(1)-processes Yt = 0.5Yt−1 + εt and

Yt = 1.5Yt−1 + εt , t = 1, . . . , 10, with εt independent standard normal
and Y0 = 0.
1 /* ar1_plot.sas */
2 TITLE1 ’Realizations of AR(1)-processes’;
3
4 /* Generated AR(1)-processes */
5 DATA data1;
6 DO a=0.5, 1.5;
7 t=0; y=0; OUTPUT;
8 DO t=1 TO 10;
9 y=a*y+RANNOR(1);
10 OUTPUT;
11 END;
12 END;
13
17 AXIS1 LABEL=(’t’) MINOR=NONE;
18 AXIS2 LABEL=(’Y’ H=1 ’t’);
19 LEGEND1 LABEL=(’a=’) SHAPE=SYMBOL(10,0.6);
20
21 /* Plot the AR(1)-processes */
22 PROC GPLOT DATA=data1(WHERE=(t>0));
23 PLOT y*t=a / HAXIS=AXIS1 VAXIS=AXIS2 LEGEND=LEGEND1;
24 RUN; QUIT;
The data are generated within two loops, the the program is submitted. A value of y is calcu-
first one over the two values for a. The vari- lated as the sum of a times the actual value of
able y is initialized with the value 0 correspond- y and the random number and stored in a new
ing to t=0. The realizations for t=1, ..., observation. The resulting data set has 22 ob-
10 are created within the second loop over t servations and 3 variables (a, t and y).
and with the help of the function RANNOR which In the plot created by PROC GPLOT the ini-
returns pseudo random numbers distributed as tial observations are dropped using the WHERE
standard normal. The argument 1 is the initial data set option. Only observations fulfilling the
seed to produce a stream of random numbers. condition t>0 are read into the data set used
A positive value of this seed always produces here. To suppress minor tick marks between
the same series of random numbers, a nega- the integers 0,1, ...,10 the option MINOR
tive value generates a different series each time in the AXIS1 statement is set to NONE.
The Yule–Walker Equations

The Yule–Walker equations entail the recursive computation of the
autocorrelation function ρ of an AR(p)-process satisfying the station-
arity condition (2.4).
Lemma 2.2.5. Let Yt = pu=1 au Yt−u + εt be an AR(p)-process, which

P
satisfies the stationarity condition (2.4). Its autocorrelation function
ρ then satisfies for s = 1, 2, . . . the recursion
p
X
ρ(s) = au ρ(s − u), (2.5)
u=1
known as Yule–Walker equations.
Proof. Put µ := E(Y0 ) and ν := E(ε0 ). Recall that Yt = P pu=1 au Yt−u +

P
εt , t ∈ Z and taking expectations on both sides µ = pu=1 au µ + ν.
Combining both equations yields
p
X
Yt − µ = au (Yt−u − µ) + εt − ν, t ∈ Z. (2.6)
u=1
By multiplying equation (2.6) with Yt−s − µ for s > 0 and taking

expectations again we obtain

γ(s) = E((Yt − µ)(Yt−s − µ))
Xp
= au E((Yt−u − µ)(Yt−s − µ)) + E((εt − ν)(Yt−s − µ))
u=1
Xp
= au γ(s − u).
u=1
for the autocovariance function γ of (Yt ). The final equation fol-

lows from the fact that Yt−s and εt are uncorrelated for s > 0. This
is
P a consequence of Theorem 2.2.3, by which almost surely Yt−s =
u≥0 bu εt−s−u with an
Pabsolutely summable causal filter (bu ) and
thus, Cov(Yt−s , εt ) = u≥0 bu Cov(εt−s−u , εt ) = 0, see Theorem 2.1.5
and Exercise 2.16. Dividing the above equation by γ(0) now yields
the assertion.
Since ρ(−s) = ρ(s), equations (2.5) can be represented as
    
ρ(1) 1 ρ(1) ρ(2) . . . ρ(p − 1) a1
ρ(2)  ρ(1)
   1 ρ(1) ρ(p − 2)  a2 
 
ρ(3) =  ρ(2)
 .   ρ(1) 1 ρ(p − 3)  a3 
 ..   .
.. ... .
..   ... 
  
ρ(p) ρ(p − 1) ρ(p − 2) ρ(p − 3) . . . 1 ap

(2.7)
This matrix equation offers an estimator of the coefficients a1 , . . . , ap
by replacing the autocorrelations ρ(j) by their empirical counterparts
r(j), 1 ≤ j ≤ p. Equation (2.7) then formally becomes r = Ra,
where r = (r(1), . . . , r(p))T , a = (a1 , . . . , ap )T and
. . . r(p − 1)
 
1 r(1) r(2)
 r(1) 1 r(1) . . . r(p − 2)
R :=  .
.. .. .
 . 
r(p − 1) r(p − 2) r(p − 3) . . . 1
If the p × p-matrix R is invertible, we can rewrite the formal equation
r = Ra as R−1 r = a, which motivates the estimator
â := R−1 r (2.8)
of the vector a = (a1 , . . . , ap )T of the coefficients.
The Partial Autocorrelation Coefficients

We have seen that the autocorrelation function ρ(k) of an MA(q)-
process vanishes for k > q, see Lemma 2.2.1. This is not true for an
AR(p)-process, whereas the partial autocorrelation coefficients will
share this property. Note that the correlation matrix

Pk : = Corr(Yi , Yj )
1≤i,j≤k
 
1 ρ(1) ρ(2) . . . ρ(k − 1)
 ρ(1)
 1 ρ(1) ρ(k − 2)
= ρ(2) ρ(1) 1 ρ(k − 3) (2.9)

 .
.. ... .
..


ρ(k − 1) ρ(k − 2) ρ(k − 3) . . . 1
is positive semidefinite for any k ≥ 1. If we suppose that Pk is positive

definite, then it is invertible, and the equation
   
ρ(1) ak1
 ..  = Pk  ... 
. (2.10)
ρ(k) akk
has the unique solution

   
ak1 ρ(1)
ak :=  ...  = Pk−1  ...  .
akk ρ(k)
The number akk is called partial autocorrelation coefficient at lag

k, denoted by α(k), k ≥ 1. Observe that for k ≥ p the vector
(a1 , . . . , ap , 0, . . . , 0) ∈ Rk , with k − p zeros added to the vector of
coefficients (a1 , . . . , ap ), is by the Yule–Walker equations (2.5) a solu-
tion of the equation (2.10). Thus we have α(p) = ap , α(k) = 0 for
k > p. Note that the coefficient α(k) also occurs as the coefficient
of Yn−k in the best linear one-step forecast ku=0 cu Yn−u of Yn+1 , see
P
equation (2.27) in Section 2.3.
If the empirical counterpart Rk of Pk is invertible as well, then
âk := Rk−1 rk ,
with rk := (r(1), . . . , r(k))T , is an obvious estimate of ak . The k-th

component
α̂(k) := âkk (2.11)
of âk = (âk1 , . . . , âkk ) is the empirical partial autocorrelation coef-
ficient at lag k. It can be utilized to estimate the order p of an
AR(p)-process, since α̂(p) ≈ α(p) = ap is different from zero, whereas
α̂(k) ≈ α(k) = 0 for k > p should be close to zero.
Example 2.2.6. The Yule–Walker equations (2.5) for an AR(2)-

process Yt = a1 Yt−1 + a2 Yt−2 + εt are for s = 1, 2
ρ(1) = a1 + a2 ρ(1), ρ(2) = a1 ρ(1) + a2
with the solutions

a1 a21
ρ(1) = , ρ(2) = + a2 .
1 − a2 1 − a2
and thus, the partial autocorrelation coefficients are
α(1) = ρ(1),
α(2) = a2 ,
α(j) = 0, j ≥ 3.
The recursion (2.5) entails the computation of ρ(s) for an arbitrary s

from the two values ρ(1) and ρ(2).
The following figure displays realizations of the AR(2)-process Yt =
0.6Yt−1 − 0.3Yt−2 + εt for 1 ≤ t ≤ 200, conditional on Y−1 = Y0 = 0.
The random shocks εt are iid standard normal. The corresponding
empirical partial autocorrelation function is shown in Plot 2.2.4.
Plot 2.2.3: Realization of the AR(2)-process Yt = 0.6Yt−1 −0.3Yt−2 +εt ,

conditional on Y−1 = Y0 = 0. The εt , 1 ≤ t ≤ 200, are iid standard
normal.
1 /* ar2_plot.sas */
2 TITLE1 ’Realisation of an AR(2)-process’;
3
4 /* Generated AR(2)-process */
5 DATA data1;
6 t=-1; y=0; OUTPUT;
7 t=0; y1=y; y=0; OUTPUT;
8 DO t=1 TO 200;
9 y2=y1;
10 y1=y;
11 y=0.6*y1-0.3*y2+RANNOR(1);
12 OUTPUT;
13 END;
14
16 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.3;
18 AXIS2 LABEL=(’Y’ H=1 ’t’);
19
20 /* Plot the AR(2)-processes */
21 PROC GPLOT DATA=data1(WHERE=(t>0));
22 PLOT y*t / HAXIS=AXIS1 VAXIS=AXIS2;
23 RUN; QUIT;
The two initial values of y are defined and values y2 (for yt−2 ), y1 and y are updated
stored in an observation by the OUTPUT state- one after the other. The data set used by
ment. The second observation contains an ad- PROC GPLOT again just contains the observa-
ditional value y1 for yt−1 . Within the loop the tions with t > 0.
Plot 2.2.4: Empirical partial autocorrelation function of the AR(2)-

data in Plot 2.2.3.
1 /* ar2_epa.sas */
2 TITLE1 ’Empirical partial autocorrelation function’;
3 TITLE2 ’of simulated AR(2)-process data’;
4 /* Note that this program requires data1 generated by the previous
,→program (ar2_plot.sas) */
5
6 /* Compute partial autocorrelation function */

7 PROC ARIMA DATA=data1(WHERE=(t>0));
8 IDENTIFY VAR=y NLAG=50 OUTCOV=corr NOPRINT;
9
11 SYMBOL1 C=GREEN V=DOT I=JOIN H=0.7;
12 AXIS1 LABEL=(’k’);
13 AXIS2 LABEL=(’a(k)’);
14
15 /* Plot autocorrelation function */
17 PLOT PARTCORR*LAG / HAXIS=AXIS1 VAXIS=AXIS2 VREF=0;
18 RUN; QUIT;
This program requires to be submitted to SAS the procedure ARIMA with the IDENTIFY state-
for execution within a joint session with Pro- ment is used to create a data set. Here we are
gram 2.2.3 (ar2 plot.sas), because it uses the interested in the variable PARTCORR containing
temporary data step data1 generated there. the values of the empirical partial autocorrela-
Otherwise you have to add the block of state- tion function from the simulated AR(2)-process
ments to this program concerning the data step. data. This variable is plotted against the lag
Like in Program 1.3.1 (sunspot correlogram.sas) stored in variable LAG.
ARMA-Processes
Moving averages MA(q) and autoregressive AR(p)-processes are spe-
cial cases of so called autoregressive moving averages. Let (εt )t∈Z be a
white noise, p, q ≥ 0 integers and a0 , . . . , ap , b0 , . . . , bq ∈ R. A real val-
ued stochastic process (Yt )t∈Z is said to be an autoregressive moving
average process of order p, q, denoted by ARMA(p, q), if it satisfies
the equation
Yt = a1 Yt−1 + a2 Yt−2 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q . (2.12)
An ARMA(p, 0)-process with p ≥ 1 is obviously an AR(p)-process,

whereas an ARMA(0, q)-process with q ≥ 1 is a moving average
MA(q). The polynomials
A(z) := 1 − a1 z − · · · − ap z p (2.13)
and
B(z) := 1 + b1 z + · · · + bq z q , (2.14)
are the characteristic polynomials of the autoregressive part and of
the moving average part of an ARMA(p, q)-process (Yt ), which we
can represent in the form
Yt − a1 Yt−1 − · · · − ap Yt−p = εt + b1 εt−1 + · · · + bq εt−q .

Denote by Zt the right-hand side of the above equation i.e., Zt :=

εt + b1 εt−1 + · · · + bq εt−q . This is a MA(q)-process and, therefore,
stationary by Theorem 2.1.6. If all p roots of the equation A(z) =
1 − a1 z − · · · − ap z p = 0 are outside of the unit circle, then we deduce
from Theorem 2.1.11 that the filter c0 = 1, cu = −au , u = 1, . . . , p,
cu = 0 elsewhere, has an absolutely summable causal inverse filter
(du )u≥0 . Consequently we obtain from the equation Zt = Yt −a1 Yt−1 −
· · · − ap Yt−p and (2.1) on page 58 that with b0 = 1, bw = 0 if w > q
X X
Yt = du Zt−u = du (εt−u + b1 εt−1−u + · · · + bq εt−q−u )
u≥0 u≥0
XX X X
= du bw εt−w−u = du bw εt−v
u≥0 w≥0 v≥0 u+w=v
X min(v,q)
X X
= bw dv−w εt−v =: αv εt−v
v≥0 w=0 v≥0
is the almost surely uniquely determined stationary solution of the

ARMA(p, q)-equation (2.12) for a given white noise (εt ) .
The condition that all p roots of the characteristic equation A(z) =
1 − a1 z − a2 z 2 − · · · − ap z p = 0 of the ARMA(p, q)-process (Yt ) are
outside of the unit circle will again be referred to in the following as
the stationarity condition (2.4).
The MA(q)-process Zt = εt + b1 εt−1 + · · · + bq εt−q is by definition
invertible if all q roots of the polynomial B(z) = 1+b1 z +· · ·+bq z q are
outside of the unit circle. Theorem 2.1.11 and equation (2.1) imply in
this case the existence of an absolutely summable causal filter (gu )u≥0
such that with a0 = −1
X X
εt = gu Zt−u = gu (Yt−u − a1 Yt−1−u − · · · − ap Yt−p−u )
u≥0 u≥0
X min(v,p)
X
=− aw gv−w Yt−v .
v≥0 w=0
In this case the ARMA(p, q)-process (Yt ) is said to be invertible.

The Autocovariance Function of an ARMA-Process

In order to deduce the autocovariance function of an ARMA(p, q)-
process (Yt ), which satisfies the stationarity condition (2.4), we com-
pute at first the absolutely summable coefficients
min(q,v)
X
αv = bw dv−w , v ≥ 0,
w=0
P
in the above representation Yt = v≥0 αv εt−v . The characteristic
polynomial D(z) of the absolutely summable causal filter (du )u≥0 co-
incides by Lemma 2.1.9 for 0 < |z| < 1 with 1/A(z), where A(z)
is given in (2.13). Thus we obtain with B(z) as given in (2.14) for
0 < |z| < 1, where we set this time a0 := −1 to simplify the following
formulas,
A(z)(B(z)D(z)) = B(z)
Xp X q
X
u v
⇔ − au z αv z = bw z w
u=0 v≥0 w=0
X X X
⇔ − au α v z w = bw z w
w≥0 u+v=w w≥0
X Xw X
⇔ − au αw−u z w = bw z w
w≥0 u=0 w≥0


 α0 = 1
w



 X
α − au αw−u = bw for 1 ≤ w ≤ p
w
⇔ u=1


 X p
αw −


 au αw−u = bw for w > p with bw = 0 for w > q.
u=1
(2.15)
Example 2.2.7. For the ARMA(1, 1)-process Yt − aYt−1 = εt + bεt−1

with |a| < 1 we obtain from (2.15)
α0 = 1, α1 − a = b, αw − aαw−1 = 0. w ≥ 2,
This implies α0 = 1, αw = aw−1 (b + a), w ≥ 1, and, hence,

X
Yt = εt + (b + a) aw−1 εt−w .
w≥1
Theorem 2.2.8. Suppose that Yt = pu=1 au Yt−u + qv=0 bv εt−v , b0 :=

P P
1, is an ARMA(p, q)-process, which satisfies the stationarity condition
(2.4). Its autocovariance function γ then satisfies the recursion
p
X q
X
2
γ(s) − au γ(s − u) = σ bv αv−s , 0 ≤ s ≤ q,
u=1 v=s
Xp
γ(s) − au γ(s − u) = 0, s ≥ q + 1, (2.16)
u=1
where αv , v ≥ 0, are the coefficients in the representation Yt =

2
P
v≥0 αv εt−v , which we computed in (2.15) and σ is the variance of
ε0 .
Consequently the autocorrelation function ρ of the ARMA(p, q) pro-
cess (Yt ) satisfies
p
X
ρ(s) = au ρ(s − u), s ≥ q + 1,
u=1
which coincides withPthe autocorrelation function of the stationary

AR(p)-process Xt = pu=1 au Xt−u + εt , c.f. Lemma 2.2.5.
Proof Pof Theorem 2.2.8.
Pq Put µ := E(Y0 ) and ν := E(ε0 ). Recall that
p
Yt = u=1 au Yt−uP+ v=0 bv εt−v
p Pq, t ∈ Z and taking expectations on
both sides µ = u=1 au µ + v=0 bv ν. Combining both equations
yields
p
X q
X
Yt − µ = au (Yt−u − µ) + bv (εt−v − ν), t ∈ Z.
u=1 v=0
Multiplying both sides with Yt−s − µ, s ≥ 0, and taking expectations,

we obtain
p
X q
X
Cov(Yt−s , Yt ) = au Cov(Yt−s , Yt−u ) + bv Cov(Yt−s , εt−v ),
u=1 v=0
which implies
p
X q
X
γ(s) − au γ(s − u) = bv Cov(Yt−s , εt−v ).
u=1 v=0
P
From the representation Yt−s = w≥0 αw εt−s−w and Theorem 2.1.5
we obtain
(
X 0 if v < s
Cov(Yt−s , εt−v ) = αw Cov(εt−s−w , εt−v ) =
w≥0
σ 2 αv−s if v ≥ s.
This implies
p
X q
X
γ(s) − au γ(s − u) = bv Cov(Yt−s , εt−v )
u=1 v=s
(
σ 2 qv=s bv αv−s
P
if s ≤ q
=
0 if s > q,
which is the assertion.
Example 2.2.9. For the ARMA(1, 1)-process Yt − aYt−1 = εt + bεt−1

with |a| < 1 we obtain from Example 2.2.7 and Theorem 2.2.8 with
σ 2 = Var(ε0 )
γ(0) − aγ(1) = σ 2 (1 + b(b + a)), γ(1) − aγ(0) = σ 2 b,
and thus
1 + 2ab + b2 (1 + ab)(a + b)
γ(0) = σ 2 , γ(1) = σ 2 .
1 − a2 1 − a2
For s ≥ 2 we obtain from (2.16)
γ(s) = aγ(s − 1) = · · · = as−1 γ(1).

Plot 2.2.5: Autocorrelation functions of ARMA(1, 1)-processes with

a = 0.8/ − 0.8, b = 0.5/0/ − 0.5 and σ 2 = 1.
1 /* arma11_autocorrelation.sas */
2 TITLE1 ’Autocorrelation functions of ARMA(1,1)-processes’;
3
4 /* Compute autocorrelations functions for different ARMA(1,1)-
,→processes */
5 DATA data1;
6 DO a=-0.8, 0.8;
7 DO b=-0.5, 0, 0.5;
8 s=0; rho=1;
9 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
10 OUTPUT;
11 s=1; rho=(1+a*b)*(a+b)/(1+2*a*b+b*b);
12 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
13 OUTPUT;
14 DO s=2 TO 10;
15 rho=a*rho;
16 q=COMPRESS(’(’ || a || ’,’ || b || ’)’);
17 OUTPUT;
18 END;
19 END;
20 END;
21
23 SYMBOL1 C=RED V=DOT I=JOIN H=0.7 L=1;
24 SYMBOL2 C=YELLOW V=DOT I=JOIN H=0.7 L=2;
25 SYMBOL3 C=BLUE V=DOT I=JOIN H=0.7 L=33;
26 SYMBOL4 C=RED V=DOT I=JOIN H=0.7 L=3;
27 SYMBOL5 C=YELLOW V=DOT I=JOIN H=0.7 L=4;
28 SYMBOL6 C=BLUE V=DOT I=JOIN H=0.7 L=5;

29 AXIS1 LABEL=(F=CGREEK ’r’ F=COMPLEX ’(k)’);
30 AXIS2 LABEL=(’lag k’) MINOR=NONE;
31 LEGEND1 LABEL=(’(a,b)=’) SHAPE=SYMBOL(10,0.8);
32
33 /* Plot the autocorrelation functions */
35 PLOT rho*s=q / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
36 RUN; QUIT;
In the data step the values of the autocorrela- to s=10 a loop is used for a recursive compu-
tion function belonging to an ARMA(1, 1) pro- tation. For the COMPRESS statement see Pro-
cess are calculated for two different values of a, gram 1.1.3 (logistic.sas).
the coefficient of the AR(1)-part, and three dif-
ferent values of b, the coefficient of the MA(1)- The second part of the program uses PROC
part. Pure AR(1)-processes result for the value GPLOT to plot the autocorrelation function, us-
b=0. For the arguments (lags) s=0 and s=1 ing known statements and options to customize
the computation is done directly, for the rest up the output.
ARIMA-Processes
Suppose that the time series (Yt ) has a polynomial trend of degree d.
Then we can eliminate this trend by considering the process (∆d Yt ),
obtained by d times differencing as described in Section 1.2. If the
filtered process (∆d Yd ) is an ARMA(p, q)-process satisfying the sta-
tionarity condition (2.4), the original process (Yt ) is said to be an
autoregressive integrated moving average of order p, d, q, denoted by
ARIMA(p, d, q). In this case constants a1 , . . . , ap , b0 = 1, b1 , . . . , bq ∈
R exist such that
p
X q
X
d d
∆ Yt = au ∆ Yt−u + bw εt−w , t ∈ Z,
u=1 w=0
where (εt ) is a white noise.

Example 2.2.10. An ARIMA(1, 1, 1)-process (Yt ) satisfies
∆Yt = a∆Yt−1 + εt + bεt−1 , t ∈ Z,
where |a| < 1, b 6= 0 and (εt ) is a white noise, i.e.,
Yt − Yt−1 = a(Yt−1 − Yt−2 ) + εt + bεt−1 , t ∈ Z.
This implies Yt = (a + 1)Yt−1 − aYt−2 + εt + bεt−1 . Note that the

characteristic polynomial of the AR-part of this ARMA(2, 1)-process
has a root 1 and the process is, thus, not stationary.
A random walk Xt = Xt−1 +εt is obviously an ARIMA(0, 1, 0)-process.
Consider Yt = St + Rt , t ∈ Z, where the random component (Rt )
is a stationary process and the seasonal component (St ) is periodic
of length s, i.e., St = St+s = St+2s = . . . for t ∈ Z. Then the pro-
cess (Yt ) is in general not stationary, but Yt∗ := Yt − Yt−s is. If this
seasonally adjusted process (Yt∗ ) is an ARMA(p, q)-process satisfy-
ing the stationarity condition (2.4), then the original process (Yt ) is
called a seasonal ARMA(p, q)-process with period length s, denoted by
SARMAs (p, q). One frequently encounters a time series with a trend
as well as a periodic seasonal component. A stochastic process (Yt )
with the property that (∆d (Yt − Yt−s )) is an ARMA(p, q)-process is,
therefore, called a SARIMA(p, d, q)-process. This is a quite common
assumption in practice.
Cointegration
In the sequel we will frequently use the notation that a time series
(Yt ) is I(d), d = 0, 1, . . . if the sequence of differences (∆d Yt ) of order
d is a stationary process. By the difference ∆0 Yt of order zero we
denote the undifferenced process Yt , t ∈ Z.
Suppose that the two time series (Yt ) and (Zt ) satisfy
Yt = aWt + εt , Zt = Wt + δt , t ∈ Z,
for some real number a 6= 0, where (Wt ) is I(1), and (εt ), (δt ) are
uncorrelated white noise processes, i.e., Cov(εt , δs ) = 0, t, s ∈ Z, and
both are uncorrelated to (Wt ).
Then (Yt ) and (Zt ) are both I(1), but
Xt := Yt − aZt = εt − aδt , t ∈ Z,
is I(0).
The fact that the combination of two nonstationary series yields a sta-
tionary process arises from a common component (Wt ), which is I(1).
More generally, two I(1) series (Yt ), (Zt ) are said to be cointegrated
(of order 1), if there exist constants µ, α1 , α2 with α1 , α2 different

from 0, such that the process
Xt = µ + α1 Yt + α2 Zt , t ∈ Z, (2.17)
is I(0). Without loss of generality, we can choose α1 = 1 in this case.

Such cointegrated time series are often encountered in macroecono-
mics (Granger, 1981; Engle and Granger, 1987). Consider, for exam-
ple, prices for the same commodity in different parts of a country.
Principles of supply and demand, along with the possibility of arbi-
trage, mean that, while the process may fluctuate more-or-less ran-
domly, the distance between them will, in equilibrium, be relatively
constant (typically about zero).
The link between cointegration and error correction can vividly be de-
scribed by the humorous tale of the drunkard and his dog, c.f. Murray
(1994). In the same way a drunkard seems to follow a random walk
an unleashed dog wanders aimlessly. We can, therefore, model their
ways by random walks
Yt = Yt−1 + εt and
Zt = Zt−1 + δt ,
where the individual single steps (εt ), (δt ) of man and dog are uncor-
related white noise processes. Random walks are not stationary, since
their variances increase, and so both processes (Yt ) and (Zt ) are not
stationary.
And if the dog belongs to the drunkard? We assume the dog to
be unleashed and thus, the distance Yt − Zt between the drunk and
his dog is a random variable. It seems reasonable to assume that
these distances form a stationary process, i.e., that (Yt ) and (Zt ) are
cointegrated with constants α1 = 1 and α2 = −1.
Cointegration requires that both variables in question be I(1), but
that a linear combination of them be I(0). This means that the first
step is to figure out if the series themselves are I(1), typically by using
unit root tests. If one or both are not I(1), cointegration of order 1 is
not an option.
Whether two processes (Yt ) and (Zt ) are cointegrated can be tested
by means of a linear regression approach. This is based on the coin-
tegration regression
Yt = β0 + β1 Zt + εt ,
where (εt ) is a stationary process and β0 , β1 ∈ R are the cointegration
constants.
One can use the ordinary least squares estimates β̂0 , β̂1 of the target
parameters β0 , β1 , which satisfy
n
X 2 n
X 2
Yt − β̂0 − β̂1 Zt = min Yt − β0 − β1 Zt ,
β0 ,β1 ∈R
t=1 t=1
and one checks, whether the estimated residuals
ε̂t = Yt − β̂0 − β̂1 Zt
are generated by a stationary process.

A general strategy for examining cointegrated series can now be sum-
marized as follows:
(i) Determine that the two series are I(1) by standard unit root
tests such as Dickey–Fuller or augmented Dickey–Fuller.
(ii) Compute ε̂t = Yt − β̂0 − β̂1 Zt using ordinary least squares.
(iii) Examine ε̂t for stationarity, using for example the Phillips–
Ouliaris test.
Example 2.2.11. (Hog Data) Quenouille (1968) Hog Data list the
annual hog supply and hog prices in the U.S. between 1867 and 1948.
Do they provide a typical example of cointegrated series? A discussion
can be found in Box and Tiao (1977).
Plot 2.2.6: Hog Data: hog supply and hog prices.
1 /* hog.sas */
2 TITLE1 ’Hog supply, hog prices and differences’;
3 TITLE2 ’Hog Data (1867-1948)’;
4 /* Note that this program requires the macro mkfields.sas to be
,→submitted before this program */
5
6 /* Read in the two data sets */
7 DATA data1;
8 INFILE ’c:\data\hogsuppl.txt’;
9 INPUT supply @@;
10
11 DATA data2;
12 INFILE ’c:\data\hogprice.txt’;
13 INPUT price @@;
14
15 /* Merge data sets, generate year and compute differences */
16 DATA data3;
18 year=_N_+1866;
19 diff=supply-price;
20
23 AXIS1 LABEL=(ANGLE=90 ’h o g s u p p l y’);
24 AXIS2 LABEL=(ANGLE=90 ’h o g p r i c e s’);
25 AXIS3 LABEL=(ANGLE=90 ’d i f f e r e n c e s’);
26
27 /* Generate three plots */
29 PROC GPLOT DATA=data3 GOUT=abb;
30 PLOT supply*year / VAXIS=AXIS1;
31 PLOT price*year / VAXIS=AXIS2;
32 PLOT diff*year / VAXIS=AXIS3 VREF=0;
33 RUN;
34
37 PROC GREPLAY NOFS IGOUT=abb TC=SASHELP.TEMPLT;
38 TEMPLATE=V3;
39 TREPLAY 1:GPLOT 2:GPLOT1 3:GPLOT2;
The supply data and the price data read in VREF=0. The plots are put into a common
from two external files are merged in data3. graphic using PROC GREPLAY and the tem-
Year is an additional variable with values plate V3. Note that the labels of the vertical
1867, 1868, . . . , 1932. By PROC GPLOT hog axes are spaced out as SAS sets their charac-
supply, hog prices and their differences ters too close otherwise.
diff are plotted in three different plots stored
in the graphics catalog abb. The horizontal For the program to work properly the macro mk-
line at the zero level is plotted by the option fields.sas has to be submitted beforehand.
Hog supply (=: yt ) and hog price (=: zt ) obviously increase in time t
and do, therefore, not seem to be realizations of stationary processes;
nevertheless, as they behave similarly, a linear combination of both
might be stationary. In this case, hog supply and hog price would be
cointegrated.
This phenomenon can easily be explained as follows. A high price zt
at time t is a good reason for farmers to breed more hogs, thus leading
to a large supply yt+1 in the next year t + 1. This makes the price zt+1
fall with the effect that farmers will reduce their supply of hogs in the
following year t + 2. However, when hogs are in short supply, their
price zt+2 will rise etc. There is obviously some error correction mech-
anism inherent in these two processes, and the observed cointegration
helps us to detect its existence.

Before we can examine the data for cointegration however, we have
to check that our two series are I(1). We will do this by the Dickey-
Fuller-test which can assume three different models for the series Yt :
∆Yt = γYt−1 + εt (2.18)

∆Yt = a0 + γYt−1 + εt (2.19)
∆Yt = a0 + a2 t + γYt−1 + εt , (2.20)
where (εt ) is a white noise with expectation 0. Note that (2.18) is a

special case of (2.19) and (2.19) is a special case of (2.20). Note also
that one can bring (2.18) into an AR(1)-form by putting a1 = γ + 1
and (2.19) into an AR(1)-form with an intercept term a0 (so called
drift term) by also putting a1 = γ + 1. (2.20) can be brought into an
AR(1)-form with a drift and trend term a0 + a2 t.
The null hypothesis of the Dickey-Fuller-test is now that γ = 0. The
corresponding AR(1)-processes would then not be stationary, since the
characteristic polynomial would then have a root on the unit circle,
a so called unit root. Note that in the case (2.20) the series Yt is
I(2) under the null hypothesis and two I(2) time series are said to be
cointegrated of order 2, if there is a linear combination of them which
is stationary as in (2.17).
The Dickey-Fuller-test now estimates a1 = γ + 1 by â1 , obtained from
an ordinary regression and checks for γ = 0 by computing the test
statistic
x := nγ̂ := n(â1 − 1), (2.21)
where n is the number of observations on which the regression is
based (usually one less than the initial number of observations). The
test statistic follows the so called Dickey-Fuller distribution which
cannot be explicitly given but has to be obtained by Monte-Carlo and
bootstrap methods. P-values derived from this distribution can for
example be obtained in SAS by the function PROBDF, see the following
program. For more information on the Dickey-Fuller-test, especially
the extension of the augmented Dickey-Fuller-test with more than one
autoregressing variable in (2.18) to (2.20) we refer to Enders (2004,
Chapter 4).
Testing for Unit Roots by Dickey-Fuller

Hog Data (1867-1948)
Beob. xsupply xprice psupply pprice
1 0.12676 . 0.70968 .
2 . 0.86027 . 0.88448
Listing 2.2.7: Dickey–Fuller test of Hog Data.

1 /* hog_dickey_fuller.sas */
2 TITLE1 ’Testing for Unit Roots by Dickey-Fuller’;
3 TITLE2 ’Hog Data (1867-1948)’;
4 /* Note that this program needs data3 generated by the previous
,→program (hog.sas) */
5
6 /* Prepare data set for regression */
7 DATA regression;
8 SET data3;
9 supply1=LAG(supply);
10 price1=LAG(price);
11
12 /* Estimate gamma for both series by regression */
13 PROC REG DATA=regression OUTEST=est;
14 MODEL supply=supply1 / NOINT NOPRINT;
15 MODEL price=price1 / NOINT NOPRINT;
16
17 /* Compute test statistics for both series */
18 DATA dickeyfuller1;
19 SET est;
20 xsupply= 81*(supply1-1);
21 xprice= 81*(price1-1);
22
23 /* Compute p-values for the three models */
24 DATA dickeyfuller2;
25 SET dickeyfuller1;
26 psupply=PROBDF(xsupply,81,1,"RZM");
27 pprice=PROBDF(xprice,81,1,"RZM");
28
29 /* Print the results */
30 PROC PRINT DATA=dickeyfuller2(KEEP= xsupply xprice psupply pprice);
31 RUN; QUIT;
Unfortunately the Dickey-Fuller-test is only im-variables. Assuming model (2.18), the regres-
plemented in the High Performance Forecast- sion is carried out for both series, suppressing
ing module of SAS (PROC HPFDIAG). Since an intercept by the option NOINT. The results
this is no standard module we compute it by are stored in est. If model (2.19) is to be inves-
hand here. tigated, NOINT is to be deleted, for model (2.19)
the additional regression variable year has to
In the first DATA step the data are prepared be inserted.
for the regression by lagging the corresponding
In the next step the corresponding test statistics a three-letter specification which of the mod-
are calculated by (2.21). The factor 81 comes els (2.18) to (2.20) is to be tested. The first let-
from the fact that the hog data contain 82 obser- ter states, in which way γ is estimated (R for re-
vations and the regression is carried out with 81 gression, S for a studentized test statistic which
observations. we did not explain) and the last two letters state
After that the corresponding p-values are com- the model (ZM (Zero mean) for (2.18), SM (sin-
puted. The function PROBDF, which completes gle mean) for (2.19), TR (trend) for (2.20)).
this task, expects four arguments. First the test
statistic, then the sample size of the regres- In the final step the test statistics and corre-
sion, then the number of autoregressive vari- sponding p-values are given to the output win-
ables in (2.18) to (2.20) (in our case 1) and dow.
The p-values do not reject the hypothesis that we have two I(1) series
under model (2.18) at the 5%-level, since they are both larger than
0.05 and thus support that γ = 0.
Since we have checked that both hog series can be regarded as I(1)
we can now check for cointegration.
The AUTOREG Procedure
Dependent Variable supply
Ordinary Least Squares Estimates
SSE 338324.258 DFE 80

MSE 4229 Root MSE 65.03117
SBC 924.172704 AIC 919.359266
Regress R-Square 0.3902 Total R-Square 0.3902
Durbin-Watson 0.5839
Phillips-Ouliaris
Cointegration Test
Lags Rho Tau
1 -28.9109 -4.0142
Standard Approx
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 515.7978 26.6398 19.36 <.0001

price 1 0.2059 0.0288 7.15 <.0001

Listing 2.2.8: Phillips–Ouliaris test for cointegration of Hog Data.
1 /* hog_cointegration.sas */
2 TITLE1 ’Testing for cointegration’;
3 TITLE2 ’Hog Data (1867-1948)’;
,→program (hog.sas) */
5
6 /* Compute Phillips-Ouliaris-test for cointegration */

7 PROC AUTOREG DATA=data3;
8 MODEL supply=price / STATIONARITY=(PHILLIPS);
9 RUN; QUIT;
The procedure AUTOREG (for autoregressive option STATIONARITY=(PHILLIPS) makes
models) uses data3 from Program 2.2.6 SAS calculate the statistics of the Phillips–
(hog.sas). In the MODEL statement a regres- Ouliaris test for cointegration.
sion from supply on price is defined and the
The output of the above program contains some characteristics of the

cointegration regression, the Phillips-Ouliaris test statistics and the
regression coefficients with their t-ratios. The Phillips-Ouliaris test
statistics need some further explanation.
The hypothesis of the Phillips–Ouliaris cointegration test is no coin-
tegration. Unfortunately SAS does not provide the p-value, but only
the values of the test statistics denoted by RHO and TAU. Tables of
critical values of these test statistics can be found in Phillips and Ou-
liaris (1990). Note that in the original paper the two test statistics
are denoted by Ẑα and Ẑt . The hypothesis is to be rejected if RHO or
TAU are below the critical value for the desired type I level error α.
For this one has to differentiate between the following cases:
(i) If model (2.18) with γ = 0 has been validated for both series,
then use the following table for critical values of RHO and TAU.
This is the so-called standard case.
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01

RHO -10.74 -11.57 -12.54 -13.81 -15.64 -18.88 -22.83
TAU -2.26 -2.35 -2.45 -2.58 -2.76 -3.05 -3.39
(ii) If model (2.19) with γ = 0 has been validated for both series,
This case is referred to as demeaned .
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01
RHO -14.91 -15.93 -17.04 -18.48 -20.49 -23.81 -28.32
TAU -2.86 -2.96 -3.07 -3.20 -3.37 -3.64 -3.96
(iii) If model (2.20) with γ = 0 has been validated for both series,
This case is said to be demeaned and detrended .
α 0.15 0.125 0.1 0.075 0.05 0.025 0.01
RHO -20.79 -21.81 -23.19 -24.75 -27.09 -30.84 -35.42
TAU -3.33 -3.42 -3.52 -3.65 -3.80 -4.07 -4.36
In our example, the RHO-value equals −28.9109 and the TAU-value

is −4.0142. Since we have seen that model (2.18) with γ = 0 is
appropriate for our series, we have to use the standard table. Both
test statistics are smaller than the critical values of −15.64 and −2.76
in the above table of the standard case and, thus, lead to a rejection
of the null hypothesis of no cointegration at the 5%-level.
For further information on cointegration we refer to the time series
book by Hamilton (1994, Chapter 19) and the one by Enders (2004,
Chapter 6).
ARCH- and GARCH-Processes

In particular the monitoring of stock prices gave rise to the idea that
the volatility of a time series (Yt ) might not be a constant but rather
a random variable, which depends on preceding realizations. The
following approach to model such a change in volatility is due to Engle
(1982).
We assume the multiplicative model
Yt = σt Zt , t ∈ Z,
where the Zt are independent and identically distributed random vari-
ables with
E(Zt ) = 0 and E(Zt2 ) = 1, t ∈ Z.
The scale σt is supposed to be a function of the past p values of the

series: p
X
2 2
σ t = a0 + aj Yt−j , t ∈ Z, (2.22)
j=1
where p ∈ {0, 1, . . . } and a0 > 0, aj ≥ 0, 1 ≤ j ≤ p − 1, ap > 0 are

constants.
The particular choice p = 0 yields obviously a white noise model for
(Yt ). Common choices for the distribution of Zt are the standard
normal distribution or the (standardized) t-distribution, which in the
non-standardized form has the density
Γ((m + 1)/2) x2 −(m+1)/2

fm (x) := √ 1+ , x ∈ R.
Γ(m/2) πm m
The number m ≥ 1 is the degree of freedom of the t-distribution. The
scale σt in the above model is determined by the past observations
Yt−1 , . . . , Yt−p , and the innovation on this scale is then provided by
Zt . We assume moreover that the process (Yt ) is a causal one in the
sense that Zt and Ys , s < t, are independent. Some autoregressive
structure is, therefore, inherent in the process (Yt ).PConditional on
Yt−j = yt−j , 1 ≤ j ≤ p, the variance of Yt is a0 + pj=1 aj yt−j 2
and,
thus, the conditional variances of the process will generally be differ-
ent. The process Yt = σt Zt is, therefore, called an autoregressive and
conditional heteroscedastic process of order p, ARCH(p)-process for
short.
If, in addition, the causal process (Yt ) is stationary, then we obviously
have
E(Yt ) = E(σt ) E(Zt ) = 0
and
σ 2 := E(Yt2 ) = E(σt2 ) E(Zt2 )

X p
2
= a0 + aj E(Yt−j )
j=1
p
X
2
= a0 + σ aj ,
j=1
which yields
a
σ2 = P0p .
1− j=1 aj
A necessary conditionPfor the stationarity of the process (Yt ) is, there-

p
fore, the inequality j=1 aj < 1. Note, moreover, that the preceding
arguments immediately imply that the Yt and Ys are uncorrelated for
different values s < t
E(Ys Yt ) = E(σs Zs σt Zt ) = E(σs Zs σt ) E(Zt ) = 0,
since Zt is independent of σt , σs and Zs . But they are not independent,

because Ys influences the scale σt of Yt by (2.22).
The following lemma is crucial. It embeds the ARCH(p)-processes to
a certain extent into the class of AR(p)-processes, so that our above
tools for the analysis of autoregressive processes can be applied here
as well.
Lemma 2.2.12. Let (Yt ) be a stationary and causal ARCH(p)-process

with constants a0 , a1 , . . . , ap . If the process of squared random vari-
ables (Yt2 ) is a stationary one, then it is an AR(p)-process:
Yt2 = a1 Yt−1
2 2
+ · · · + ap Yt−p + εt ,
where (εt ) is a white noise with E(εt ) = a0 , t ∈ Z.
Proof. From the assumption that (Yt ) is an ARCH(p)-process we ob-

tain
p
X
εt := Yt2 − 2
aj Yt−j = σt2 Zt2 − σt2 + a0 = a0 + σt2 (Zt2 − 1), t ∈ Z.
j=1
This implies E(εt ) = a0 and
E((εt − a0 )2 ) = E(σt4 ) E((Zt2 − 1)2 )

Xp
= E (a0 + aj Yt−j ) E((Zt2 − 1)2 ) =: c,
2 2
j=1
independent of t by the stationarity of (Yt2 ). For h ∈ N the causality

of (Yt ) finally implies
E((εt − a0 )(εt+h − a0 )) = E(σt2 σt+h

2
(Zt2 − 1)(Zt+h
2
− 1))
= E(σt2 σt+h
2
(Zt2 − 1)) E(Zt+h
2
− 1) = 0,
i.e., (εt ) is a white noise with E(εt ) = a0 .

The process (Yt2 ) satisfies, therefore, Ppthe stationarity condition (2.4)
j
if all p roots of the equation 1 − j=1 aj z = 0 are outside of the
unit circle. Hence, we can estimate the order p using an estimate as
in (2.11) of the partial autocorrelation function of (Yt2 ). The Yule–
Walker equations provide us, for example, with an estimate of the
coefficients a1 , . . . , ap , which then can be utilized to estimate the ex-
pectation a0 of the error εt .
Note that conditional on Yt−1 = yt−1 , . . . , Yt−p = yt−p , the distribution
of Yt = σt Zt is a normal one if the Zt are normally distributed. In
this case it is possible to write down explicitly the joint density of
the vector (Yp+1 , . . . , Yn ), conditional on Y1 = y1 , . . . , Yp = yp (Ex-
ercise 2.40). A numerical maximization of this density with respect
to a0 , a1 , . . . , ap then leads to a maximum likelihood estimate of the
vector of constants; see also Section 2.3.
A generalized ARCH-process, GARCH(p, q) (Bollerslev, 1986), adds
an autoregressive structure to the scale σt by assuming the represen-
tation p q
X X
2 2 2
σ t = a0 + aj Yt−j + bk σt−k ,
j=1 k=1
where the constants bk are nonnegative. The set of parameters aj , bk

can again be estimated by conditional maximum likelihood as before
if a parametric model for the distribution of the innovations Zt is
assumed.
Example 2.2.13. (Hongkong Data). The daily Hang Seng closing
index was recorded between July 16th, 1981 and September 30th,
1983, leading to a total amount of 552 observations pt . The daily log
returns are defined as
yt := log(pt ) − log(pt−1 ),
where we now have a total of n = 551 observations. The expansion

log(1 + ε) ≈ ε implies that
pt − pt−1 pt − pt−1
yt = log 1 + ≈ ,
pt−1 pt−1
provided that pt−1 and pt are close to each other. In this case we can
interpret the return as the difference of indices on subsequent days,
relative to the initial one.
We use an ARCH(3) model for the generation of yt , which seems to

be a plausible choice by the partial autocorrelations plot. If one as-
sumes t-distributed innovations Zt , SAS estimates the distribution’s
degrees of freedom and displays the reciprocal in the TDFI-line, here
m = 1/0.1780 = 5.61 degrees of freedom. We obtain the estimates
a0 = 0.000214, a1 = 0.147593, a2 = 0.278166 and a3 = 0.157807. The
SAS output also contains some general regression model information
from an ordinary least squares estimation approach, some specific in-
formation for the (G)ARCH approach and as mentioned above the
estimates for the ARCH model parameters in combination with t ra-
tios and approximated p-values. The following plots show the returns
of the Hang Seng index, their squares and the autocorrelation func-
tion of the log returns, indicating a possible ARCH model, since the
values are close to 0. The pertaining partial autocorrelation function
of the squared process and the parameter estimates are also given.
Plot 2.2.9: Log returns of Hang Seng index and their squares.
1 /* hongkong_plot.sas */
2 TITLE1 ’Daily log returns and their squares’;
3 TITLE2 ’Hongkong Data ’;
4
5 /* Read in the data, compute log return and their squares */
6 DATA data1;
7 INFILE ’c:\data\hongkong.txt’;
8 INPUT p@@;
9 t=_N_;
10 y=DIF(LOG(p));
11 y2=y**2;
12
14 SYMBOL1 C=RED V=DOT H=0.5 I=JOIN L=1;
15 AXIS1 LABEL=(’y’ H=1 ’t’) ORDER=(-.12 TO .10 BY .02);
16 AXIS2 LABEL=(’y2’ H=1 ’t’);
17
18 /* Generate two plots */

20 PROC GPLOT DATA=data1 GOUT=abb;
21 PLOT y*t / VAXIS=AXIS1;
22 PLOT y2*t / VAXIS=AXIS2;
23 RUN;
24
27 PROC GREPLAY NOFS IGOUT=abb TC=SASHELP.TEMPLT;
28 TEMPLATE=V2;
29 TREPLAY 1:GPLOT 2:GPLOT1;
In the DATA step the observed values of the values in y2.
Hang Seng closing index are read into the vari- After defining different axis labels, two plots
able p from an external file. The time index are generated by two PLOT statements in PROC
variable t uses the SAS-variable N , and the GPLOT, but they are not displayed. By means of
log transformed and differenced values of the PROC GREPLAY the plots are merged vertically
index are stored in the variable y, their squared in one graphic.
Plot 2.2.10: Autocorrelations of log returns of Hang Seng index.

Plot 2.2.10b: Partial autocorrelations of squares of log returns of Hang

Seng index.
The AUTOREG Procedure
Dependent Variable = Y
Ordinary Least Squares Estimates
SSE 0.265971 DFE 551

MSE 0.000483 Root MSE 0.021971
SBC -2643.82 AIC -2643.82
Reg Rsq 0.0000 Total Rsq 0.0000
Durbin-Watson 1.8540
NOTE: No intercept term is used. R-squares are redefined.
GARCH Estimates
SSE 0.265971 OBS 551

MSE 0.000483 UVAR 0.000515
Log L 1706.532 Total Rsq 0.0000
SBC -3381.5 AIC -3403.06
Normality Test 119.7698 Prob>Chi-Sq 0.0001
Variable DF B Value Std Error t Ratio Approx Prob
ARCH0 1 0.000214 0.000039 5.444 0.0001

ARCH1 1 0.147593 0.0667 2.213 0.0269
ARCH2 1 0.278166 0.0846 3.287 0.0010

ARCH3 1 0.157807 0.0608 2.594 0.0095
TDFI 1 0.178074 0.0465 3.833 0.0001
Listing 2.2.10c: Parameter estimates in the ARCH(3)-model for stock
returns.
1 /* hongkong_pa.sas */
2 TITLE1 ’ARCH(3)-model’;
3 TITLE2 ’Hongkong Data’;
,→program (hongkong_plot.sas) */
5
6 /* Compute (partial) autocorrelation function */
8 IDENTIFY VAR=y NLAG=50 OUTCOV=data3;
9 IDENTIFY VAR=y2 NLAG=50 OUTCOV=data2;
10
12 SYMBOL1 C=RED V=DOT H=0.5 I=JOIN;
13 AXIS1 LABEL=(ANGLE=90);
14
15 /* Plot autocorrelation function of supposed ARCH data */

17 PLOT corr*lag / VREF=0 VAXIS=AXIS1;
18 RUN;
19
20
21 /* Plot partial autocorrelation function of squared data */

23 PLOT partcorr*lag / VREF=0 VAXIS=AXIS1;
24 RUN;
25
26 /* Estimate ARCH(3)-model */
27 PROC AUTOREG DATA=data1;
28 MODEL y = / NOINT GARCH=(q=3) DIST=T;
29 RUN; QUIT;
To identify the order of a possibly underly- is substantially different from 0.
ing ARCH process for the daily log returns
of the Hang Seng closing index, the empiri- PROC AUTOREG is used to analyze the
cal autocorrelation of the log returns, the em- ARCH(3) model for the daily log returns. The
pirical partial autocorrelations of their squared MODEL statement specifies the dependent vari-
values, which are stored in the variable y2 able y. The option NOINT suppresses an in-
of the data set data1 in Program 2.2.9 tercept parameter, GARCH=(q=3) selects the
(hongkong plot.sas), are calculated by means ARCH(3) model and DIST=T determines a t
of PROC ARIMA and the IDENTIFY statement. distribution for the innovations Zt in the model
The subsequent procedure GPLOT displays equation. Note that, in contrast to our notation,
these (partial) autocorrelations. A horizontal SAS uses the letter q for the ARCH model or-
reference line helps to decide whether a value der.
2.3 The Box–Jenkins Program 99
2.3 Specification of ARMA-Models:

The Box–Jenkins Program
The aim of this section is to fit a time series model (Yt )t∈Z to a given
set of data y1 , . . . , yn collected in time t. We suppose that the data
y1 , . . . , yn are (possibly) variance-stabilized as well as trend or sea-
sonally adjusted. We assume that they were generated by clipping
Y1 , . . . , Yn from an ARMA(p, q)-process (Yt )t∈Z , which we will fit to
the data in the P following. As noted in Section 2.2, we could also fit
the model Yt = v≥0 αv εt−v to the data, where (εt ) is a white noise.
But then we would have to determine infinitely many parameters αv ,
v ≥ 0. By the principle of parsimony it seems, however, reasonable
to fit only the finite number of parameters of an ARMA(p, q)-process.
The Box–Jenkins program consists of four steps:
1. Order selection: Choice of the parameters p and q.
2. Estimation of coefficients: The coefficients of the AR part of

the model a1 , . . . , ap and the MA part b1 , . . . , bq are estimated.
3. Diagnostic check: The fit of the ARMA(p, q)-model with the

estimated coefficients is checked.
4. Forecasting: The prediction of future values of the original pro-

cess.
The four steps are discussed in the following. The application of the
Box–Jenkins Program is presented in a case study in Chapter 7.
Order Selection
The order q of a moving average MA(q)-process can be estimated
by means of the empirical autocorrelation function r(k) i.e., by the
correlogram. Part (iv) of Lemma 2.2.1 shows that the autocorrelation
function ρ(k) vanishes for k ≥ q + 1. This suggests to choose the
order q such that r(q) is clearly different from zero, whereas r(k) for
k ≥ q + 1 is quite close to zero. This, however, is obviously a rather
vague selection rule.
The order p of an AR(p)-process can be estimated in an analogous way

using the empirical partial autocorrelation function α̂(k), k ≥ 1, as
defined in (2.11). Since α̂(p) should be close to the p-th coefficient ap
of the AR(p)-process, which is different from zero, whereas α̂(k) ≈ 0
for k > p, the above rule can be applied again with r replaced by α̂.
The choice of the orders p and q of an ARMA(p, q)-process is a bit
more challenging. In this case one commonly takes the pair (p, q),
minimizing some information function, which is based on an estimate
2
σ̂p,q of the variance of ε0 .
Popular functions are Akaike’s Information Criterion
2 p+q
AIC(p, q) := log(σ̂p,q )+2 ,
n
the Bayesian Information Criterion
2 (p + q) log(n)
BIC(p, q) := log(σ̂p,q )+
n
and the Hannan–Quinn Criterion
2 2(p + q)c log(log(n))

HQ(p, q) := log(σ̂p,q )+ with c > 1.
n
AIC and BIC are discussed in Brockwell and Davis (1991, Section
9.3) for Gaussian processes (Yt ), where the joint distribution of an
arbitrary vector (Yt1 , . . . , Ytk ) with t1 < · · · < tk is multivariate nor-
mal, see below. For the HQ-criterion we refer to Hannan and Quinn
2
(1979). Note that the variance estimate σ̂p,q , which uses estimated
model parameters, discussed in the next section, will in general be-
come arbitrarily small as p + q increases. The additive terms in the
above criteria serve, therefore, as penalties for large values, thus help-
ing to prevent overfitting of the data by choosing p and q too large.
It can be shown that BIC and HQ lead under certain regularity con-
ditions to strongly consistent estimators of the model order. AIC has
the tendency not to underestimate the model order. Simulations point
to the fact that BIC is generally to be preferred for larger samples,
see Shumway and Stoffer (2006, Section 2.2).
More sophisticated methods for selecting the orders p and q of an
ARMA(p, q)-process are presented in Section 7.5 within the applica-
tion of the Box–Jenkins Program in a case study.
Estimation of Coefficients
Suppose we fixed the order p and q of an ARMA(p, q)-process (Yt )t∈Z ,
with Y1 , . . . , Yn now modelling the data y1 , . . . , yn . In the next step
we will derive estimators of the constants a1 , . . . , ap , b1 , . . . , bq in the
model
Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q , t ∈ Z.
The Gaussian Model: Maximum Likelihood Estima-

tor
We assume first that (Yt ) is a Gaussian process and thus, the joint
distribution of (Y1 , . . . , Yn ) is a n-dimensional normal distribution
Z s1 Z sn
P {Yi ≤ si , i = 1, . . . , n} = ... ϕµ,Σ (x1 , . . . , xn ) dxn . . . dx1
−∞ −∞
for arbitrary s1 , . . . , sn ∈ R. Here
ϕµ,Σ (x1 , . . . , xn )
1
= ·
(2π)n/2 (det Σ)1/2
1
T −1 T
exp − ((x1 , . . . , xn ) − µ )Σ ((x1 , . . . , xn ) − µ)
2
is for arbitrary x1 , . . . , xn ∈ R the density of the n-dimensional normal
distribution with mean vector µ = (µ, . . . , µ)T ∈ Rn and covariance
matrix Σ = (γ(i − j))1≤i,j≤n , denoted by N (µ, Σ), where µ = E(Y0 )
and γ is the autocovariance function of the stationary process (Yt ).
The number ϕµ,Σ (x1 , . . . , xn ) reflects the probability that the random
vector (Y1 , . . . , Yn ) realizes close to (x1 , . . . , xn ). Precisely, we have for
ε↓0
P {Yi ∈ [xi − ε, xi + ε], i = 1, . . . , n}

Z x1 +ε Z xn +ε
= ... ϕµ,Σ (z1 , . . . , zn ) dzn . . . dz1
x1 −ε xn −ε
n n
≈ 2 ε ϕµ,Σ (x1 , . . . , xn ).
The likelihood principle is the fact that a random variable tends to

attain its most likely value and thus, if the vector (Y1 , . . . , Yn ) actually
attained the value (y1 , . . . , yn ), the unknown underlying mean vector
µ and covariance matrix Σ ought to be such that ϕµ,Σ (y1 , . . . , yn )
is maximized. The computation of these parameters leads to the
maximum likelihood estimator of µ and Σ.
We assume that the process P (Yt ) satisfies the stationarity condition
(2.4), in which case Yt = v≥0 αv εt−v , t ∈ Z, is invertible, where (εt )
is a white noise and the coefficients αv depend only on a1 , . . . , ap and
b1 , . . . , bq . Consequently we have for s ≥ 0
XX X
2
γ(s) = Cov(Y0 , Ys ) = αv αw Cov(ε−v , εs−w ) = σ αv αs+v .
v≥0 w≥0 v≥0
The matrix
Σ0 := σ −2 Σ,
therefore, depends only on a1 , . . . , ap and b1 , . . . , bq .
We can write now the density ϕµ,Σ (x1 , . . . , xn ) as a function of ϑ :=
(σ 2 , µ, a1 , . . . , ap , b1 , . . . , bq ) ∈ Rp+q+2 and (x1 , . . . , xn ) ∈ Rn
p(x1 , . . . , xn |ϑ)
:= ϕµ,Σ (x1 , . . . , xn )
2 −n/2 0 −1/2 1

= (2πσ ) (det Σ ) exp − 2 Q(ϑ|x1 , . . . , xn ) ,
2σ
where
−1
Q(ϑ|x1 , . . . , xn ) := ((x1 , . . . , xn ) − µT )Σ0 ((x1 , . . . , xn )T − µ)
is a quadratic function. The likelihood function pertaining to the

outcome (y1 , . . . , yn ) is
L(ϑ|y1 , . . . , yn ) := p(y1 , . . . , yn |ϑ).
A parameter ϑ̂ maximizing the likelihood function
L(ϑ̂|y1 , . . . , yn ) = sup L(ϑ|y1 , . . . , yn ),

ϑ
is then a maximum likelihood estimator of ϑ.

Due to the strict monotonicity of the logarithm, maximizing the like-

lihood function is in general equivalent to the maximization of the
loglikelihood function
l(ϑ|y1 , . . . , yn ) = log L(ϑ|y1 , . . . , yn ).
The maximum likelihood estimator ϑ̂ therefore satisfies
l(ϑ̂|y1 , . . . , yn )
= sup l(ϑ|y1 , . . . , yn )
ϑ
!
n 1 1
= sup − log(2πσ 2 ) − log(det Σ0 ) − 2 Q(ϑ|y1 , . . . , yn ) .
ϑ 2 2 2σ
The computation of a maximizer is a numerical and usually computer

intensive problem. Some further insights are given in Section 7.5.
Example 2.3.1. The AR(1)-process Yt = aYt−1 + εt with |a| < 1 has

by Example 2.2.4 the autocovariance function
as
γ(s) = σ 2 , s ≥ 0,
1 − a2
and thus,
a2 . . . an−1
 
1 a
0 1  a 1 a an−2 
Σ =  . ... ..  .
1 − a2  .. . 
an−1 an−2 an−3 . . . 1
The inverse matrix is

 
1 −a 0 0 ... 0
−a 1 + a2 −a 0 0 
1 + a2 −a
 
0 −1
 0 −a 0 
Σ =  ... ... ..  .
 . 

 0 ... −a 1 + a2 −a
0 0 ... 0 −a 1
Check that the determinant of Σ0 −1 is det(Σ0 −1 ) = 1−a2 = 1/ det(Σ0 ),

see Exercise 2.44. If (Yt ) is a Gaussian process, then the likelihood
function of ϑ = (σ 2 , µ, a) is given by
2 −n/2 2 1/2
1
L(ϑ|y1 , . . . , yn ) = (2πσ ) (1 − a ) exp − 2 Q(ϑ|y1 , . . . , yn ) ,
2σ
where
Q(ϑ|y1 , . . . , yn )
−1
= ((y1 , . . . , yn ) − µ)Σ0 ((y1 , . . . , yn ) − µ)T
n−1
X
2 2 2
= (y1 − µ) + (yn − µ) + (1 + a ) (yi − µ)2
i=2
n−1
X
− 2a (yi − µ)(yi+1 − µ).
i=1
Nonparametric Approach: Least Squares

If E(εt ) = 0, then
Ŷt = a1 Yt−1 + · · · + ap Yt−p + b1 εt−1 + · · · + bq εt−q
would obviously be a reasonable one-step forecast of the ARMA(p, q)-

process
Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q ,
based on Yt−1 , . . . , Yt−p and εt−1 , . . . , εt−q . The prediction error is given
by the residual
Yt − Ŷt = εt .
Suppose that ε̂t is an estimator of εt , t ≤ n, which depends on the
choice of constants a1 , . . . , ap , b1 , . . . , bq and satisfies the recursion
ε̂t = yt − a1 yt−1 − · · · − ap yt−p − b1 ε̂t−1 − · · · − bq ε̂t−q .

The function
S 2 (a1 , . . . , ap , b1 , . . . , bq )
X n
= ε̂2t
t=−∞
Xn
= (yt − a1 yt−1 − · · · − ap yt−p − b1 ε̂t−1 − · · · − bq ε̂t−q )2
t=−∞
is the residual sum of squares and the least squares approach suggests
to estimate the underlying set of constants by minimizers a1 , . . . , ap
and b1 , . . . , bq of S 2 . Note that the residuals ε̂t and the constants are
nested.
We have no observation yt available for t ≤ 0. But from the assump-
tion E(εt ) = 0 and thus E(Yt ) = 0, it is reasonable to backforecast yt
by zero and to put ε̂t := 0 for t ≤ 0, leading to
n
X
2
S (a1 , . . . , ap , b1 , . . . , bq ) = ε̂2t .
t=1
The estimated residuals ε̂t can then be computed from the recursion
ε̂1 = y1
ε̂2 = y2 − a1 y1 − b1 ε̂1
ε̂3 = y3 − a1 y2 − a2 y1 − b1 ε̂2 − b2 ε̂1 (2.23)
..
.
ε̂j = yj − a1 yj−1 − · · · − ap yj−p − b1 ε̂j−1 − · · · − bq ε̂j−q ,
where j now runs from max{p, q} + 1 to n.
For example for an ARMA(2, 3)–process we have
ε̂1 = y1
ε̂2 = y2 − a1 y1 − b1 ε̂1
ε̂3 = y3 − a1 y2 − a2 y1 − b1 ε̂2 − b2 ε̂1
ε̂4 = y4 − a1 y3 − a2 y2 − b1 ε̂3 − b2 ε̂2 − b3 ε̂1
ε̂5 = y5 − a1 y4 − a2 y3 − b1 ε̂4 − b2 ε̂3 − b3 ε̂2
..
.
From step 4 in this iteration procedure the order (2, 3) has been at-
tained.
The coefficients a1 , . . . , ap of a pure AR(p)-process can be estimated
directly, using the Yule–Walker equations as described in (2.8).
Diagnostic Check
Suppose that the orders p and q as well as the constants a1 , . . . , ap
and b1 , . . . , bq have been chosen in order to model an ARMA(p, q)-
process underlying the data. The Portmanteau-test of Box and Pierce
(1970) checks, whether the estimated residuals ε̂t , t = 1, . . . , n, behave
approximately like realizations from a white noise process. To this end
one considers the pertaining empirical autocorrelation function
Pn−k
j=1 (ε̂j − ε̄)(ε̂j+k − ε̄)
r̂ε̂ (k) := Pn 2
, k = 1, . . . , n − 1, (2.24)
j=1 (ε̂ j − ε̄)
where ε̄ = n−1 nj=1 ε̂j , and checks, whether the values r̂ε (k) are suf-
P
ficiently close to zero. This decision is based on
K
X
Q(K) := n r̂ε̂ (k),
k=1
which follows asymptotically for n → ∞ and K not too large a χ2 -

distribution with K − p − q degrees of freedom if (Yt ) is actually
an ARMA(p, q)-process (see e.g. Brockwell and Davis (1991, Section
9.4)). The parameter K must be chosen such that the sample size n−k
in r̂ε (k) is large enough to give a stable estimate of the autocorrelation
function. The hypothesis H0 that the estimated residuals result from a
white noise process and therefore the ARMA(p, q)-model is rejected if
the p-value 1 − χ2K−p−q (Q(K)) is too small, since in this case the value
Q(K) is unexpectedly large. By χ2K−p−q we denote the distribution
function of the χ2 -distribution with K − p − q degrees of freedom. To
accelerate the convergence to the χ2K−p−q distribution under the null
hypothesis of an ARMA(p, q)-process, one often replaces the Box–
Pierce statistic Q(K) by the Box–Ljung statistic (Ljung and Box,
1978)
K 1/2 !2 K
∗
X n + 2 X 1
Q (K) := n r̂ε̂ (k) = n(n + 2) r̂ε̂ (k)
n−k n−k
k=1 k=1
with weighted empirical autocorrelations.

Another method for diagnostic check is overfitting, which will be pre-
sented in Section 7.6 within the application of the Box–Jenkins Pro-
gram in a case study.
Forecasting
We want to determine weights c∗0 , . . . , c∗n−1 ∈ R such that for h ∈ N
 !2 
n−1
X
E Yn+h − Ŷn+h = min E  Yn+h − cu Yn−u  ,
c0 ,...,cn−1 ∈R
u=0
Pn−1 ∗
with Ŷn+h := u=0 cu Yn−u . Then Ŷn+h with minimum mean squared
error is said to be a best (linear) h-step forecast of Yn+h , based on
Y1 , . . . , Yn . The following result provides a sufficient condition for the
optimality of weights.
Lemma 2.3.2. Let (Yt ) be an arbitrary stochastic process with finite

second moments and h ∈ N. If the weights c∗0 , . . . , c∗n−1 have the prop-
erty that
n−1
!!
X
E Yi Yn+h − c∗u Yn−u = 0, i = 1, . . . , n, (2.25)
u=0
Pn−1 ∗
then Ŷn+h := u=0 cu Yn−u is a best h-step forecast of Yn+h .
Pn−1
Proof. Let Ỹn+h := u=0 cu Yn−u be an arbitrary forecast, based on
Y1 , . . . , Yn . Then we have
E((Yn+h − Ỹn+h )2 )
= E((Yn+h − Ŷn+h + Ŷn+h − Ỹn+h )2 )
n−1
X
2
= E((Yn+h − Ŷn+h ) ) + 2 (c∗u − cu ) E(Yn−u (Yn+h − Ŷn+h ))
u=0
2
+ E((Ŷn+h − Ỹn+h ) )
= E((Yn+h − Ŷn+h )2 ) + E((Ŷn+h − Ỹn+h )2 )
≥ E((Yn+h − Ŷn+h )2 ).
Suppose that (Yt ) is a stationary process with mean zero and auto-
correlation function ρ. The equations (2.25) are then of Yule–Walker
type
n−1
X
ρ(h + s) = c∗u ρ(s − u), s = 0, 1, . . . , n − 1,
u=0
or, in matrix language
   ∗ 
ρ(h) c0
 ρ(h + 1)   c∗1 
 ..  = Pn  ..  (2.26)
 .   . 
ρ(h + n − 1) c∗n−1
with the matrix Pn as defined in (2.9). If this matrix is invertible,

then  ∗   
c0 ρ(h)
 ...  := Pn−1  ..
.  (2.27)
∗
cn−1 ρ(h + n − 1)
is the uniquely determined solution of (2.26).
If we put h = 1, then equation (2.27) implies that c∗n−1 equals the
partial autocorrelation coefficient α(n). In this case, α(n) is the coef-
ficient of Y1 in the best linear one-step forecast Ŷn+1 = n−1 ∗
P
u=0 cu Yn−u
of Yn+1 .
Example 2.3.3. Consider the MA(1)-process Yt = εt + aεt−1 with

E(ε0 ) = 0. Its autocorrelation function is by Example 2.2.2 given by
ρ(0) = 1, ρ(1) = a/(1 + a2 ), ρ(u) = 0 for u ≥ 2. The matrix Pn
equals therefore
 
a
1 1+a2 0 0 ... 0
 a a
 1+a2 1 1+a2 0 0 
 0 a a
1+a2 1 1+a2 0 
Pn =  .. .
 
 . . .. .
.. 

a 
 0 ... 1 1+a

2
a
0 0 ... 1+a2 1
Check that the matrix Pn = (Corr(Yi , Yj ))1≤i,j≤n is positive definite,

xT Pn x > 0 for any x ∈ Rn unless x = 0 (Exercise 2.44), and thus,
P is invertible. The best forecast of Yn+1 is by (2.27), therefore,
Pnn−1 ∗
u=0 cu Yn−u with  a 
 ∗  1+a2
c0
 ...  = Pn−1  0. 
 
 .. 
c∗n−1
0
which is a/(1 + a2 ) times the first column of Pn−1 . The best forecast
of Yn+h for h ≥ 2 is by (2.27) the constant 0. Note that Yn+h is for
h ≥ 2 uncorrelated with Y1 , . . . , Yn and thus not really predictable by
Y1 , . . . , Yn .
Theorem 2.3.4. Suppose that Yt = pu=1 au Yt−u + εt , t ∈ Z, is an
P
AR(p)-process, which satisfies the stationarity condition (2.4) and has
zero mean E(Y0 ) = 0. Let n ≥ p. The best one-step forecast is
Ŷn+1 = a1 Yn + a2 Yn−1 + · · · + ap Yn+1−p
and the best two-step forecast is
Ŷn+2 = a1 Ŷn+1 + a2 Yn + · · · + ap Yn+2−p .
The best h-step forecast for arbitrary h ≥ 2 is recursively given by
Ŷn+h = a1 Ŷn+h−1 + · · · + ah−1 Ŷn+1 + ah Yn + · · · + ap Yn+h−p .

Proof. Since (Yt ) satisfies the stationarity condition (2.4), it is in-

vertible by Theorem 2.2.3 i.e., thereP exists an absolutely summable
causal filter (bu )u≥0 such that Yt = u≥0P bu εt−u , t ∈ Z, almost surely.
This implies in particular E(Yt εt+h ) = u≥0 bu E(εt−u εt+h ) = 0 for
any h ≥ 1, cf. Theorem 2.1.5. Hence we obtain for i = 1, . . . , n
E((Yn+1 − Ŷn+1 )Yi ) = E(εn+1 Yi ) = 0
from which the assertion for h = 1 follows by Lemma 2.3.2. The case
of an arbitrary h ≥ 2 is now a consequence of the recursion
E((Yn+h − Ŷn+h )Yi )

  
min(h−1,p) min(h−1,p)
X X
= E εn+h + au Yn+h−u − au Ŷn+h−u  Yi 
u=1 u=1
min(h−1,p)
X
= au E Yn+h−u − Ŷn+h−u Yi = 0, i = 1, . . . , n,
u=1
and Lemma 2.3.2.

A repetition of the arguments in the preceding proof implies the fol-
lowing result, which shows that for an ARMA(p, q)-process the fore-
cast of Yn+h for h > q is controlled only by the AR-part of the process.
Theorem 2.3.5. Suppose that Yt = pu=1 au Yt−u + εt + qv=1 bv εt−v ,

P P
t ∈ Z, is an ARMA(p, q)-process, which satisfies the stationarity con-
dition (2.4) and has zero mean, precisely E(ε0 ) = 0. Suppose that
n + q − p ≥ 0. The best h-step forecast of Yn+h for h > q satisfies the
recursion p
X
Ŷn+h = au Ŷn+h−u .
u=1
Example 2.3.6. We illustrate the best forecast of the ARMA(1, 1)-

process
Yt = 0.4Yt−1 + εt − 0.6εt−1 , t ∈ Z,
with E(Yt ) = E(εt ) = 0. First we need the optimal 1-step forecast Ybi
for i = 1, . . . , n. These are defined by putting unknown values of Yt
Exercises 111
with an index t ≤ 0 equal to their expected value, which is zero. We,

thus, obtain
Yb1 := 0, ε̂1 := Y1 − Yb1 = Y1 ,

Yb2 := 0.4Y1 + 0 − 0.6ε̂1
= −0.2Y1 , ε̂2 := Y2 − Yb2 = Y2 + 0.2Y1 ,
Yb3 := 0.4Y2 + 0 − 0.6ε̂2
= 0.4Y2 − 0.6(Y2 + 0.2Y1 )
= −0.2Y2 − 0.12Y1 , ε̂3 := Y3 − Yb3 ,
.. ..
. .
until Ybi and ε̂i are defined for i = 1, . . . , n. The actual forecast is then
given by
Ybn+1 = 0.4Yn + 0 − 0.6ε̂n = 0.4Yn − 0.6(Yn − Ybn ),

Ybn+2 = 0.4Ybn+1 + 0 + 0,
..
.
Ybn+h = 0.4Ybn+h−1 = · · · = 0.4h−1 Ybn+1 −→h→∞ 0,
where εt with index t ≥ n + 1 is replaced by zero, since it is uncorre-

lated with Yi , i ≤ n.
In practice one replaces the usually unknown coefficients au , bv in the
above forecasts by their estimated values.
Exercises
2.1. Show that the expectation of complex valued random variables
is linear, i.e.,
E(aY + bZ) = a E(Y ) + b E(Z),
where a, b ∈ C and Y, Z are integrable.
Show that
Cov(Y, Z) = E(Y Z̄) − E(Y )E(Z̄)
for square integrable complex random variables Y and Z.
2.2. Suppose that the complex random variables Y and Z are square
integrable. Show that
Cov(aY + b, Z) = a Cov(Y, Z), a, b ∈ C.
2.3. Give an example of a stochastic process (Yt ) such that for arbi-
trary t1 , t2 ∈ Z and k 6= 0
E(Yt1 ) 6= E(Yt1 +k ) but Cov(Yt1 , Yt2 ) = Cov(Yt1 +k , Yt2 +k ).
2.4. Let (Xt ), (Yt ) be stationary processes such that Cov(Xt , Ys ) = 0

for t, s ∈ Z. Show that for arbitrary a, b ∈ C the linear combinations
(aXt + bYt ) yield a stationary process.
Suppose that the decomposition Zt = Xt + Yt , t ∈ Z holds. Show
that stationarity of (Zt ) does not necessarily imply stationarity of
(Xt ).
2.5. Show that the process Yt = Xeiat , a ∈ R, is stationary, where
X is a complex valued random variable with mean zero and finite
variance.
Show that the random variable Y = beiU has mean zero, where U is
a uniformly distributed random variable on (0, 2π) and b ∈ C.
2.6. Let Z1 , Z2 be independent and normal N (µi , σi2 ), i = 1, 2, dis-
tributed random variables and choose λ ∈ R. For which means
µ1 , µ2 ∈ R and variances σ12 , σ22 > 0 is the cosinoid process
Yt = Z1 cos(2πλt) + Z2 sin(2πλt), t∈Z
stationary?
2.7. Show that the autocovariance function γ : Z → C of a complex-
valued stationary process (Yt )t∈Z , which is defined by
γ(h) = E(Yt+h Ȳt ) − E(Yt+h ) E(Ȳt ), h ∈ Z,
has the following properties: γ(0) ≥ 0,P|γ(h)| ≤ γ(0), γ(h) = γ(−h),

i.e., γ is a Hermitian function, and 1≤r,s≤n zr γ(r − s)z̄s ≥ 0 for
z1 , . . . , zn ∈ C, n ∈ N, i.e., γ is a positive semidefinite function.
Exercises 113
2.8. Suppose that Y t , t = 1, . . . , n, is a stationary process with mean

−1
P n
µ. Then µ̂n := n t=1 Yt is an unbiased estimator of µ. Express the
mean square error E(µ̂n − µ)2 in terms of the autocovariance function
γ and show that E(µ̂n − µ)2 → 0 if γ(n) → 0, n → ∞.
2.9. Suppose that (Yt )t∈Z is a stationary process and denote by

( P
1 n−|k|
n t=1 (Yt − Ȳ )(Yt+|k| − Ȳ ), |k| = 0, . . . , n − 1,
c(k) :=
0, |k| ≥ n.
the empirical autocovariance function at lag k, k ∈ Z.
(i) Show that c(k) is a biased estimator of γ(k) (even if the factor
n−1 is replaced by (n − k)−1 ) i.e., E(c(k)) 6= γ(k).
(ii) Show that the k-dimensional empirical covariance matrix
. . . c(k − 1)
 
c(0) c(1)
 c(1) c(0) c(k − 2)
Ck :=  .
.. ... .. 
 . 
c(k − 1) c(k − 2) . . . c(0)
is positive semidefinite. (If the factor n−1 in the definition of c(j)

is replaced by (n − j)−1 , j = 1, . . . , k, the resulting covariance
matrix may not be positive semidefinite.) Hint: Consider k ≥ n
and write Ck = n−1 AAT with a suitable k × 2k-matrix A.
Show further that Cm is positive semidefinite if Ck is positive
semidefinite for k > m.
(iii) If c(0) > 0, then Ck is nonsingular, i.e., Ck is positive definite.
2.10. Suppose that (Yt ) is a stationary process with autocovariance

function γY . Express the autocovariance function of the difference
filter of first order ∆Yt = Yt − Yt−1 in terms of γY . Find it when
γY (k) = λ|k| .
2.11. Let (Yt )t∈Z be a stationary process with mean zero. If its au-
tocovariance function satisfies γ(τ ) = 0 for some τ > 0, then γ is
periodic with length τ , i.e., γ(t + τ ) = γ(t), t ∈ Z.
2.12. Let (Yt ) be a stochastic process such that for t ∈ Z
P {Yt = 1} = pt = 1 − P {Yt = −1}, 0 < pt < 1.
Suppose in addition that (Yt ) is a Markov process, i.e., for any t ∈ Z,

k≥1
P (Yt = y0 |Yt−1 = y1 , . . . , Yt−k = yk ) = P (Yt = y0 |Yt−1 = y1 ).
(i) Is (Yt )t∈N a stationary process?
(ii) Compute the autocovariance function in case P (Yt = 1|Yt−1 =

1) = λ and pt = 1/2.
2.13. Let (εt )t be a white noise process with independent εt ∼ N (0, 1)

and define (
εt , if t is even,
ε̃t = √
(ε2t−1 − 1)/ 2, if t is odd.
Show that (ε̃t )t is a white noise process with E(ε̃t ) = 0 and Var(ε̃t ) =
1, where the ε̃t are neither independent nor identically distributed.
Plot the path of (εt )t and (ε̃t )t for t = 1, . . . , 100 and compare!
2.14. Let (εt )t∈Z be a white noise. The process Yt = ts=1 εs is said
P
to be a random walk . Plot the path of a random walk with normal
N (µ, σ 2 ) distributed εt for each of the cases µ < 0, µ = 0 and µ > 0.
2.15. Let (au ),(bu ) be absolutely summable filters and let (Zt ) be a
stochastic process with supt∈Z E(Zt2 ) < ∞. Put for t ∈ Z
X X
Xt = au Zt−u , Yt = bv Zt−v .
u v
Then we have
XX
E(Xt Yt ) = au bv E(Zt−u Zt−v ).
u v
Hint: Use the general inequality |xy| ≤ (x2 + y 2 )/2.

Exercises 115
2.16. Show the equality

n
X n
X
E((Yt − µY )(Ys − µY )) = lim Cov au Zt−u , aw Zs−w
n→∞
u=−n w=−n
in the proof of Theorem 2.1.6.
2.17. Let Yt = aYt−1 + εt , t ∈ Z be an AR(1)-process with |a| > 1.

Compute the autocorrelation function of this process.
2.18.
Pp Compute the orders p and the coefficients au of the process Yt =
u=0 au εt−u with Var(ε0 ) = 1 and autocovariance function γ(1) =
2, γ(2) = 1, γ(3) = −1 and γ(t) = 0 for t ≥ 4. Is this process
invertible?
2.19. The autocorrelation function ρ of an arbitrary MA(q)-process

satisfies q
1 X 1
− ≤ ρ(v) ≤ q.
2 v=1
2
Give examples of MA(q)-processes, where the lower bound and the
upper bound are attained, i.e., these bounds are sharp.
2.20. Let (Yt )t∈Z be a stationary stochastic process with E(Yt ) = 0,

t ∈ Z, and 
1
 if t = 0
ρ(t) = ρ(1) if t = 1

0 if t > 1,
where |ρ(1)| < 1/2. Then there exists a ∈ (−1, 1) and a white noise
(εt )t∈Z such that
Yt = εt + aεt−1 .
Hint: Example 2.2.2.
2.21. Find two MA(1)-processes with the same autocovariance func-

tions.
2.22. Suppose that Yt = εt + aεt−1 is a noninvertible MA(1)-process,

where |a| > 1. Define the new process
∞
X
ε̃t = (−a)−j Yt−j
j=0
and show that (ε̃t ) is a white noise. Show that Var(ε̃t ) = a2 Var(εt )
and (Yt ) has the invertible representation
Yt = ε̃t + a−1 ε̃t−1 .
2.23. Plot the autocorrelation functions of MA(p)-processes for dif-

ferent values of p.
2.24. Generate and plot AR(3)-processes (Yt ), t = 1, . . . , 500 where
the roots of the characteristic polynomial have the following proper-
ties:
(i) all roots are outside the unit disk,
(ii) all roots are inside the unit disk,
(iii) all roots are on the unit circle,
(iv) two roots are outside, one root inside the unit disk,
(v) one root is outside, one root is inside the unit disk and one root
is on the unit circle,
(vi) all roots are outside the unit disk but close to the unit circle.
2.25. Show that the AR(2)-process Yt = a1 Yt−1 + a2 Yt−2 + εt for a1 =
1/3 and a2 = 2/9 has the autocorrelation function
16 2 |k| 5 1 |k|
ρ(k) = + − , k∈Z
21 3 21 3
and for a1 = a2 = 1/12 the autocorrelation function
45 1 |k| 32 1 |k|
ρ(k) = + − , k ∈ Z.
77 3 77 4
Exercises 117
2.26. Let (εt ) be a white noise with E(ε0 ) = µ, Var(ε0 ) = σ 2 and put
Yt = εt − Yt−1 , t ∈ N, Y0 = 0.
Show that √
Corr(Ys , Yt ) = (−1)s+t min{s, t}/ st.
2.27. An AR(2)-process Yt = a1 Yt−1 + a2 Yt−2 + εt satisfies the station-
arity condition (2.4), if the pair (a1 , a2 ) is in the triangle
n o
2
∆ := (α, β) ∈ R : −1 < β < 1, α + β < 1 and β − α < 1 .
Hint: Use the fact that necessarily ρ(1) ∈ (−1, 1).

2.28. Let (Yt ) denote the unique stationary solution of the autore-
gressive equations
Yt = aYt−1 + εt , t ∈ Z,
with |a| > 1. Then (Yt ) is given by the expression
∞
X
Yt = − a−j εt+j
j=1
(see the proof of Lemma 2.1.10). Define the new process

1
ε̃t = Yt − Yt−1 ,
a
and show that (ε̃t ) is a white noise with Var(ε̃t ) = Var(εt )/a2 . These
calculations show that (Yt ) is the (unique stationary) solution of the
causal AR-equations
1
Yt = Yt−1 + ε̃t , t ∈ Z.
a
Thus, every AR(1)-process with |a| > 1 can be represented as an
AR(1)-process with |a| < 1 and a new white noise.
Show that for |a| = 1 the above autoregressive equations have no
stationary solutions. A stationary solution exists if the white noise
process is degenerated, i.e., E(ε2t ) = 0.
2.29. Consider the process

(
ε1 for t = 1
Ỹt :=
aYt−1 + εt for t > 1,
i.e., Ỹt , t ≥ 1, equals the AR(1)-process Yt = aYt−1 + εt , conditional

on Y0 = 0. Compute E(Ỹt ), Var(Ỹt ) and Cov(Yt , Yt+s ). Is there some-
thing like asymptotic stationarity for t → ∞?
Choose a ∈ (−1, 1), a 6= 0, and compute the correlation matrix of

Y1 , . . . , Y10 .
2.30. Use the IML function ARMASIM to simulate the stationary

AR(2)-process
Yt = −0.3Yt−1 + 0.3Yt−2 + εt .
Estimate the parameters a1 = −0.3 and a2 = 0.3 by means of the
Yule–Walker equations using the SAS procedure PROC ARIMA.
2.31. Show that the value at lag 2 of the partial autocorrelation func-
tion of the MA(1)-process
Yt = εt + aεt−1 , t∈Z
is
a2
α(2) = − .
1 + a2 + a4
2.32. (Unemployed1 Data) Plot the empirical autocorrelations and
partial autocorrelations of the trend and seasonally adjusted Unem-
ployed1 Data from the building trade, introduced in Example 1.1.1.
Apply the Box–Jenkins program. Is a fit of a pure MA(q)- or AR(p)-
process reasonable?
2.33. Plot the autocorrelation functions of ARMA(p, q)-processes for

different values of p, q using the IML function ARMACOV. Plot also
their empirical counterparts.
2.34. Compute the autocovariance and autocorrelation function of an

ARMA(1, 2)-process.
Exercises 119
2.35. Derive the least squares normal equations for an AR(p)-process

and compare them with the Yule–Walker equations.
2.36. Let (εt )t∈Z be a white noise. The process Wt = ts=1 εs is then
P
called a random walk. Generate two independent random walks µt ,
νt , t = 1, . . . , 100, where the εt are standard normal and independent.
Simulate from these
(1) (2) (3)
X t = µ t + δt , Yt = µt + δt , Zt = νt + δt ,
(i)
where the δt are again independent and standard normal, i = 1, 2, 3.
Plot the generated processes Xt , Yt and Zt and their first order dif-
ferences. Which of these series are cointegrated? Check this by the
Phillips-Ouliaris-test.
2.37. (US Interest Data) The file “us interest rates.txt” contains the
interest rates of three-month, three-year and ten-year federal bonds of
the USA, monthly collected from July 1954 to December 2002. Plot
the data and the corresponding differences of first order. Test also
whether the data are I(1). Check next if the three series are pairwise
cointegrated.
2.38. Show that the density of the t-distribution with m degrees of

freedom converges to the density of the standard normal distribution
as m tends to infinity. Hint: Apply the dominated convergence theo-
rem (Lebesgue).
2.39. Let (Yt )t be a stationary and causal ARCH(1)-process with

|a1 | < 1.
(i) Show that Yt2 = a0 ∞ j 2 2 2

P
j=0 a1 Zt Zt−1 · · · · · Zt−j with probability
one.
(ii) Show that E(Yt2 ) = a0 /(1 − a1 ).
(iii) Evaluate E(Yt4 ) and deduce that E(Z14 )a21 < 1 is a sufficient
condition for E(Yt4 ) < ∞.
Hint: Theorem 2.1.5.

2.40. Determine the joint density of Yp+1 , . . . , Yn for an ARCH(p)-

process Yt with normal distributed Zt given that Y1 = y1 , . . . , Yp =
yp . Hint: Recall that the joint density fX,Y of a random vector
(X, Y ) can be written in the form fX,Y (x, y) = fY |X (y|x)fX (x), where
fY |X (y|x) := fX,Y (x, y)/fX (x) if fX (x) > 0, and fY |X (y|x) := fY (y),
else, is the (conditional) density of Y given X = x and fX , fY is the
(marginal) density of X, Y .
2.41. Generate an ARCH(1)-process (Yt )t with a0 = 1 and a1 = 0.5.

Plot (Yt )t as well as (Yt2 )t and its partial autocorrelation function.
What is the value of the partial autocorrelation coefficient at lag 1
and lag 2? Use PROC ARIMA to estimate the parameter of the AR(1)-
process (Yt2 )t and apply the Box–Ljung test.
2.42. (Hong Kong Data) Fit an GARCH(p, q)-model to the daily

Hang Seng closing index of Hong Kong stock prices from July 16, 1981,
to September 31, 1983. Consider in particular the cases p = q = 2
and p = 3, q = 2.
2.43. (Zurich Data) The daily value of the Zurich stock index was
recorded between January 1st, 1988 and December 31st, 1988. Use
a difference filter of first order to remove a possible trend. Plot the
(trend-adjusted) data, their squares, the pertaining partial autocor-
relation function and parameter estimates. Can the squared process
be considered as an AR(1)-process?
2.44. Show that the matrix Σ0 −1 in Example 2.3.1 has the determi-
nant 1 − a2 .
Show that the matrix Pn in Example 2.3.3 has the determinant (1 +

a2 + a4 + · · · + a2n )/(1 + a2 )n .
2.45. (Car Data) Apply the Box–Jenkins program to the Car Data.
Chapter
State-Space Models
In state-space models we have, in general, a nonobservable target
process (Xt ) and an observable process (Yt ). They are linked by the
assumption that (Yt ) is a linear function of (Xt ) with an added noise,
3
where the linear function may vary in time. The aim is the derivation
of best linear estimates of Xt , based on (Ys )s≤t .
3.1 The State-Space Representation

Many models of time series such as ARMA(p, q)-processes can be
embedded in state-space models if we allow in the following sequences
of random vectors Xt ∈ Rk and Yt ∈ Rm .
A multivariate state-space model is now defined by the state equation
Xt+1 = At Xt + Bt εt+1 ∈ Rk , (3.1)

describing the time-dependent behavior of the state Xt ∈ Rk , and the
observation equation
Yt = C t X t + η t ∈ R m . (3.2)
We assume that (At ), (Bt ) and (Ct ) are sequences of known matrices,
(εt ) and (ηt ) are uncorrelated sequences of white noises with mean
vectors 0 and known covariance matrices Cov(εt ) = E(εt εTt ) =: Qt ,
Cov(ηt ) = E(ηt ηtT ) =: Rt .
We suppose further that X0 and εt , ηt , t ≥ 1, are uncorrelated, where
two random vectors W ∈ Rp and V ∈ Rq are said to be uncorrelated
if their components are i.e., if the matrix of their covariances vanishes
E((W − E(W )(V − E(V ))T ) = 0.

122 State-Space Models
By E(W ) we denote the vector of the componentwise expectations of

W . We say that a time series (Yt ) has a state-space representation if
it satisfies the representations (3.1) and (3.2).
Example 3.1.1. Let (ηt ) be a white noise in R and put
Yt := µt + ηt
with linear trend µt = a + bt. This simple model can be represented

as a state-space model as follows. Define the state vector Xt as

µt
Xt := ,
1
and put
1 b
A :=
0 1
From the recursion µt+1 = µt + b we then obtain the state equation

µt+1 1 b µt
Xt+1 = = = AXt ,
1 0 1 1
and with
C := (1, 0)
the observation equation

µ
Yt = (1, 0) t + ηt = CXt + ηt .
1
Note that the state Xt is nonstochastic i.e., Bt = 0. This model is

moreover time-invariant, since the matrices A, B := Bt and C do
not depend on t.
Example 3.1.2. An AR(p)-process
Yt = a1 Yt−1 + · · · + ap Yt−p + εt
with a white noise (εt ) has a state-space representation with state

vector
Xt = (Yt , Yt−1 , . . . , Yt−p+1 )T .
3.1 The State-Space Representation 123
If we define the p × p-matrix A by

 
a1 a2 . . . ap−1 ap
1 0 ... 0 0
 
 0. 1 .
A :=  0
..
0
.. 
 .. .. . .
0 0 ... 1 0
and the p × 1-matrices B, C T by
B := (1, 0, . . . , 0)T =: C T ,
then we have the state equation
Xt+1 = AXt + Bεt+1
and the observation equation
Yt = CXt .
Example 3.1.3. For the MA(q)-process
Yt = εt + b1 εt−1 + · · · + bq εt−q
we define the non observable state
Xt := (εt , εt−1 , . . . , εt−q )T ∈ Rq+1 .
With the (q + 1) × (q + 1)-matrix

 
0 0 0 ... 0 0
1 0 0 . . . 0 0
 
A := 0. 1 0 0 0
. . . ... ... 
,
 .. 
0 0 0 ... 1 0
the (q + 1) × 1-matrix
B := (1, 0, . . . , 0)T
and the 1 × (q + 1)-matrix
C := (1, b1 , . . . , bq )
we obtain the state equation
Xt+1 = AXt + Bεt+1
Yt = CXt .
Example 3.1.4. Combining the above results for AR(p) and MA(q)-
processes, we obtain a state-space representation of ARMA(p, q)-pro-
cesses
Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q .
In this case the state vector can be chosen as
Xt := (Yt , Yt−1 , . . . , Yt−p+1 , εt , εt−1 , . . . , εt−q+1 )T ∈ Rp+q .
We define the (p + q) × (p + q)-matrix

 a a ... a a b b2 ... bq−1 bq

1 2 p−1 p 1
1 0 ... 0 0 0 ... ... ... 0
 .. .. 
 0 1 0 ... 0 . . 
 .. ... .. .. ..
 . . . .
0 ... 0 1 0 0 ... ... ... 0 
A :=  0 ,

0 ... ... ... 0 0 ... ... ...
.. ..
. . 1
 
 0 ... ... 0 
.. ..
. . 0
 
1 0 ... 0 
.. .. .. ..

. . . ... .
0 ... ... ... 0 0 ... 0 1 0
the (p + q) × 1-matrix
B := (1, 0, . . . , 0, 1, 0, . . . , 0)T
with the entry 1 at the first and (p + 1)-th position and the 1 × (p + q)-
matrix
C := (1, 0, . . . , 0).
3.2 The Kalman-Filter 125
Then we have the state equation

Xt+1 = AXt + Bεt+1
Yt = CXt .
Remark 3.1.5. General ARIMA(p, d, q)-processes also have a state-
space representation, see exercise 3.2. Consequently the same is true
for random walks, which can be written as ARIMA(0, 1, 0)-processes.
3.2 The Kalman-Filter

The key problem in the state-space model (3.1), (3.2) is the estimation
of the nonobservable state Xt . It is possible to derive an estimate
of Xt recursively from an estimate of Xt−1 together with the last
observation Yt , known as the Kalman recursions (Kalman 1960). We
obtain in this way a unified prediction technique for each time series
model that has a state-space representation.
We want to compute the best linear prediction
X̂t := D1 Y1 + · · · + Dt Yt (3.3)
of Xt , based on Y1 , . . . , Yt i.e., the k × m-matrices D1 , . . . , Dt are
such that the mean squared error is minimized
E((Xt − X̂t )T (Xt − X̂t ))
X t t
X
T
= E((Xt − Dj Yj ) (Xt − Dj Yj ))
j=1 j=1
Xt t
X
= min E((Xt − Dj0 Yj )T (Xt − Dj0 Yj )). (3.4)
k×m−matrices D10 ,...,Dt0
j=1 j=1
By repeating the arguments in the proof of Lemma 2.3.2 we will prove

the following result. It states that X̂t is a best linear prediction of
Xt based on Y1 , . . . , Yt if each component of the vector Xt − X̂t
is orthogonal to each component of the vectors Ys , 1 ≤ s ≤ t, with
respect to the inner product E(XY ) of two random variable X and Y .
Lemma 3.2.1. If the estimate X̂t defined in (3.3) satisfies
E((Xt − X̂t )YsT ) = 0, 1 ≤ s ≤ t, (3.5)
then it minimizes the mean squared error (3.4).
Note that E((Xt − X̂t )YsT ) is a k × m-matrix, which is generated by

multiplying each component of Xt − X̂t ∈ Rk with each component
of Ys ∈ Rm .
Proof. Let Xt0 = tj=1 Dj0 Yj ∈ Rk be an arbitrary linear combination
P
of Y1 , . . . , Yt . Then we have
E((Xt − Xt0 )T (Xt − Xt0 ))

Xt T t
X
0 0
= E Xt − X̂t + (Dj − Dj )Yj Xt − X̂t + (Dj − Dj )Yj
j=1 j=1
t
X
T
= E((Xt − X̂t ) (Xt − X̂t )) + 2 E((Xt − X̂t )T (Dj − Dj0 )Yj )
j=1
t
X t
T X
+E (Dj − Dj0 )Yj (Dj − Dj0 )Yj
j=1 j=1
≥ E((Xt − X̂t )T (Xt − X̂t )),
since in the second-to-last line the final term is nonnegative and the
second one vanishes by the property (3.5).
Let now X̂t−1 be a linear prediction of Xt−1 fulfilling (3.5) based on
the observations Y1 , . . . , Yt−1 . Then
X̃t := At−1 X̂t−1 (3.6)
is the best linear prediction of Xt based on Y1 , . . . , Yt−1 , which is easy

to see. We simply replaced εt in the state equation by its expectation
0. Note that εt and Ys are uncorrelated if s < t i.e., E((Xt −X̃t )YsT ) =
0 for 1 ≤ s ≤ t − 1, see Exercise 3.4.
From this we obtain that
Ỹt := Ct X̃t
is the best linear prediction of Yt based on Y1 , . . . , Yt−1 , since E((Yt −

Ỹt )YsT ) = E((Ct (Xt − X̃t ) + ηt )YsT ) = 0, 1 ≤ s ≤ t − 1; note that ηt
and Ys are uncorrelated if s < t, see also Exercise 3.4.
Define now by
ˆ t := E((Xt − X̂t )(Xt − X̂t )T ),
∆ ˜ t := E((Xt − X̃t )(Xt − X̃t )T ).
∆
the covariance matrices of the approximation errors. Then we have
˜ t = E((At−1 (Xt−1 − X̂t−1 ) + Bt−1 εt ) ·
∆
(At−1 (Xt−1 − X̂t−1 ) + Bt−1 εt )T )
= E(At−1 (Xt−1 − X̂t−1 )(At−1 (Xt−1 − X̂t−1 ))T )
+ E((Bt−1 εt )(Bt−1 εt )T )
ˆ t−1 ATt−1 + Bt−1 Qt Bt−1
= At−1 ∆ T
,
since εt and Xt−1 − X̂t−1 are obviously uncorrelated. In complete

analogy one shows that
˜ t CtT + Rt .
E((Yt − Ỹt )(Yt − Ỹt )T ) = Ct ∆
Suppose that we have observed Y1 , . . . , Yt−1 , and that we have pre-
dicted Xt by X̃t = At−1 X̂t−1 . Assume that we now also observe Yt .
How can we use this additional information to improve the prediction
X̃t of Xt ? To this end we add a matrix Kt such that we obtain the
best prediction X̂t based on Y1 , . . . Yt :
X̃t + Kt (Yt − Ỹt ) = X̂t (3.7)
i.e., we have to choose the matrix Kt according to Lemma 3.2.1 such
that Xt − X̂t and Ys are uncorrelated for s = 1, . . . , t. In this case,
the matrix Kt is called the Kalman gain.
Lemma 3.2.2. The matrix Kt in (3.7) is a solution of the equation
˜ t CtT + Rt ) = ∆
Kt (Ct ∆ ˜ t CtT . (3.8)
Proof. The matrix Kt has to be chosen such that Xt − X̂t and Ys are
uncorrelated for s = 1, . . . , t, i.e.,
0 = E((Xt − X̂t )YsT ) = E((Xt − X̃t − Kt (Yt − Ỹt ))YsT ), s ≤ t.
Note that an arbitrary k × m-matrix Kt satisfies

T
E (Xt − X̃t − Kt (Yt − Ỹt ))Ys
= E((Xt − X̃t )YsT ) − Kt E((Yt − Ỹt )YsT ) = 0, s ≤ t − 1.
In order to fulfill the above condition, the matrix Kt needs to satisfy

only
0 = E((Xt − X̃t )YtT ) − Kt E((Yt − Ỹt )YtT )

= E((Xt − X̃t )(Yt − Ỹt )T ) − Kt E((Yt − Ỹt )(Yt − Ỹt )T )
= E((Xt − X̃t )(Ct (Xt − X̃t ) + ηt )T ) − Kt E((Yt − Ỹt )(Yt − Ỹt )T )
= E((Xt − X̃t )(Xt − X̃t )T )CtT − Kt E((Yt − Ỹt )(Yt − Ỹt )T )
=∆˜ t CtT − Kt (Ct ∆
˜ t CtT + Rt ).
But this is the assertion of Lemma 3.2.2. Note that Ỹt is a linear
combination of Y1 , . . . , Yt−1 and that ηt and Xt − X̃t are uncorrelated.
˜ t CtT + Rt is invertible, then

If the matrix Ct ∆
˜ t CtT (Ct ∆
Kt := ∆ ˜ t CtT + Rt )−1
is the uniquely determined Kalman gain. We have, moreover, for a

Kalman gain
ˆ t = E((Xt − X̂t )(Xt − X̂t )T )
∆

T
= E (Xt − X̃t − Kt (Yt − Ỹt ))(Xt − X̃t − Kt (Yt − Ỹt ))
˜ t + Kt E((Yt − Ỹt )(Yt − Ỹt )T )KtT
=∆
− E((Xt − X̃t )(Yt − Ỹt )T )KtT − Kt E((Yt − Ỹt )(Xt − X̃t )T )
=∆˜ t + Kt (Ct ∆
˜ t CtT + Rt )KtT
−∆ ˜ t CtT KtT − Kt Ct ∆
˜t
=∆˜ t − Kt Ct ∆ ˜t
by (3.8) and the arguments in the proof of Lemma 3.2.2.

The recursion in the discrete Kalman filter is done in two steps: From
ˆ t−1 one computes in the prediction step first
X̂t−1 and ∆
X̃t = At−1 X̂t−1 ,
Ỹt = Ct X̃t ,
˜ t = At−1 ∆
∆ ˆ t−1 ATt−1 + Bt−1 Qt Bt−1
T
. (3.9)
In the updating step one then computes Kt and the updated values
ˆt
X̂t , ∆
˜ t CtT (Ct ∆
Kt = ∆ ˜ t CtT + Rt )−1 ,
X̂t = X̃t + Kt (Yt − Ỹt ),
ˆt = ∆
∆ ˜ t − Kt Ct ∆
˜ t. (3.10)
An obvious problem is the choice of the initial values X̃1 and ∆ ˜ 1 . One
frequently puts X̃1 = 0 and ∆ ˜ 1 as the diagonal matrix with constant
entries σ 2 > 0. The number σ 2 reflects the degree of uncertainty about
the underlying model. Simulations as well as theoretical results show,
however, that the estimates X̂t are often not affected by the initial
values X̃1 and ∆ ˜ 1 if t is large, see for instance Example 3.2.3 below.
If in addition we require that the state-space model (3.1), (3.2) is
completely determined by some parametrization ϑ of the distribution
of (Yt ) and (Xt ), then we can estimate the matrices of the Kalman
filter in (3.9) and (3.10) under suitable conditions by a maximum
likelihood estimate of ϑ; see e.g. Brockwell and Davis (2002, Section
8.5) or Janacek and Swift (1993, Section 4.5).
By iterating the 1-step prediction X̃t = At−1 X̂t−1 of Xt in (3.6)
h times, we obtain the h-step prediction of the Kalman filter
X̃t+h := At+h−1 X̃t+h−1 , h ≥ 1,
with the initial value X̃t+0 := X̂t . The pertaining h-step prediction
of Yt+h is then
Ỹt+h := Ct+h X̃t+h , h ≥ 1.
Example 3.2.3. Let (ηt ) be a white noise in R with E(ηt ) = 0,
E(ηt2 ) = σ 2 > 0 and put for some µ ∈ R
Yt := µ + ηt , t ∈ Z.
This process can be represented as a state-space model by putting

Xt := µ, with state equation Xt+1 = Xt and observation equation
Yt = Xt + ηt i.e., At = 1 = Ct and Bt = 0. The prediction step (3.9)
of the Kalman filter is now given by
˜ t = ∆t−1 .
X̃t = X̂t−1 , Ỹt = X̃t , ∆
Note that all these values are in R. The h-step predictions X̃t+h , Ỹt+h
are, therefore, given by X̃t+1 = X̂t . The update step (3.10) of the
Kalman filter is
∆t−1
Kt =
∆t−1 + σ 2
X̂t = X̂t−1 + Kt (Yt − X̂t−1 )
ˆt = ∆
ˆ t−1 − Kt ∆
ˆ t−1 = ∆
ˆ t−1 σ2
∆ .
∆t−1 + σ 2
ˆ t = E((Xt − X̂t )2 ) ≥ 0 and thus,
Note that ∆
ˆt = ∆
ˆ t−1 σ2 ˆ t−1
0≤∆ ≤∆
∆t−1 + σ 2
ˆt
is a decreasing and bounded sequence. Its limit ∆ := limt→∞ ∆
consequently exists and satisfies
σ2
∆=∆
∆ + σ2
i.e., ∆ = 0. This means that the mean squared error E((Xt − X̂t )2 ) =
E((µ − X̂t )2 ) vanishes asymptotically, no matter how the initial values
X̃1 and ∆ ˜ 1 are chosen. Further we have limt→∞ Kt = 0, which means
that additional observations Yt do not contribute to X̂t if t is large.
Finally, we obtain for the mean squared error of the h-step prediction
Ỹt+h of Yt+h
E((Yt+h − Ỹt+h )2 ) = E((µ + ηt+h − X̂t )2 )

= E((µ − X̂t )2 ) + E(ηt+h
2
) −→t→∞ σ 2 .
Example 3.2.4. The following figure displays the Airline Data from
Example 1.3.1 together with 12-step forecasts based on the Kalman
filter. The original data yt , t = 1, . . . , 144 were log-transformed xt =
log(yt ) to stabilize the variance; first order differences ∆xt = xt − xt−1
were used to eliminate the trend and, finally, zt = ∆xt − ∆xt−12
were computed to remove the seasonal component of 12 months. The
Kalman filter was applied to forecast zt , t = 145, . . . , 156, and the
results were transformed in the reverse order of the preceding steps
to predict the initial values yt , t = 145, . . . , 156.
Plot 3.2.1: Airline Data and predicted values using the Kalman filter.
1 /* airline_kalman.sas */
2 TITLE1 ’Original and Forecasted Data’;
4
5 /* Read in the data and compute log-transformation */
6 DATA data1;
8 INPUT y;
9 yl=LOG(y);
10 t=_N_;
11
12 /* Compute trend and seasonally adjusted data set */

13 PROC STATESPACE DATA=data1 OUT=data2 LEAD=12;
14 VAR yl(1,12); ID t;
15
16 /* Compute forecasts by inverting the log-transformation */
17 DATA data3;
18 SET data2;
19 yhat=EXP(FOR1);
20
21 /* Merge data sets */
22 DATA data4(KEEP=t y yhat);
24 BY t;
25
27 LEGEND1 LABEL=(’’) VALUE=(’original’ ’forecast’);
28 SYMBOL1 C=BLACK V=DOT H=0.7 I=JOIN L=1;
29 SYMBOL2 C=BLACK V=CIRCLE H=1.5 I=JOIN L=1;
30 AXIS1 LABEL=(ANGLE=90 ’Passengers’);
31 AXIS2 LABEL=(’January 1949 to December 1961’);
32
33 /* Plot data and forecasts */
35 PLOT y*t=1 yhat*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
36 RUN; QUIT;
In the first data step the Airline Data are read tified by the time index set to t. The results are
into data1. Their logarithm is computed and stored in the data set data2 with forecasts of
stored in the variable yl. The variable t con- 12 months after the end of the input data. This
tains the observation number. is invoked by LEAD=12.
The statement VAR yl(1,12) of the PROC data3 contains the exponentially trans-
STATESPACE procedure tells SAS to use first formed forecasts, thereby inverting the log-
order differences of the initial data to remove transformation in the first data step.
their trend and to adjust them to a seasonal Finally, the two data sets are merged and dis-
component of 12 months. The data are iden- played in one plot.
Exercises
3.1. Consider the two state-space models
Xt+1 = At Xt + Bt εt+1
Yt = C t X t + η t
Exercises 133
and
X̃t+1 = Ãt X̃t + B̃t ε̃t+1

Ỹt = C̃t X̃t + η̃t ,
where (εTt , ηtT , ε̃Tt , η̃tT )T is a white noise. Derive a state-space repre-
sentation for (YtT , ỸtT )T .
3.2. Find the state-space representation of an ARIMA(p, d, q)-process

(Yt )t . Hint: Yt = ∆d Yt − dj=1 (−1)j dd Yt−j and consider the state
P
vector Zt := (Xt , Yt−1 )T , where Xt ∈ Rp+q is the state vector of the

ARMA(p, q)-process ∆d Yt and Yt−1 := (Yt−d , . . . , Yt−1 )T .
3.3. Assume that the matrices A and B in the state-space model

(3.1) are independent of t and that all eigenvalues of A are in the
interior of the unit circle {z ∈ C : |z| ≤ 1}. Show that the unique
stationary
P∞ solution of equation (3.1) is given by the infinite series
j
Xt = j=0 A Bεt−j+1 . Hint: The condition on the eigenvalues is
equivalent to det(Ir − Az) 6= 0 for |z| ≤ 1. Show that there exists
some
P∞ ε > 0 such that (Ir − Az)−1 has the power series representation
j j
j=0 A z in the region |z| < 1 + ε.
3.4. Show that εt and Ys are uncorrelated and that ηt and Ys are
uncorrelated if s < t.
3.5. Apply PROC STATESPACE to the simulated data of the AR(2)-

process in Exercise 2.28.
3.6. (Gas Data) Apply PROC STATESPACE to the gas data. Can
they be stationary? Compute the one-step predictors and plot them
together with the actual data.
Chapter
The Frequency Domain
Approach of a Time
Series
4
The preceding sections focussed on the analysis of a time series in the
time domain, mainly by modelling and fitting an ARMA(p, q)-process
to stationary sequences of observations. Another approach towards
the modelling and analysis of time series is via the frequency domain:
A series is often the sum of a whole variety of cyclic components, from
which we had already added to our model (1.2) a long term cyclic one
or a short term seasonal one. In the following we show that a time
series can be completely decomposed into cyclic components. Such
cyclic components can be described by their periods and frequencies.
The period is the interval of time required for one cycle to complete.
The frequency of a cycle is its number of occurrences during a fixed
time unit; in electronic media, for example, frequencies are commonly
measured in hertz , which is the number of cycles per second, abbre-
viated by Hz. The analysis of a time series in the frequency domain
aims at the detection of such cycles and the computation of their
frequencies.
Note that in this chapter the results are formulated for any data
y1 , . . . , yn , which need for mathematical reasons not to be generated
by a stationary process. Nevertheless it is reasonable to apply the
results only to realizations of stationary processes, since the empiri-
cal autocovariance function occurring below has no interpretation for
non-stationary processes, see Exercise 1.21.
136 The Frequency Domain Approach of a Time Series
4.1 Least Squares Approach with Known

Frequencies
A function f : R −→ R is said to be periodic with period P > 0
if f (t + P ) = f (t) for any t ∈ R. A smallest period is called a
fundamental one. The reciprocal value λ = 1/P of a fundamental
period is the fundamental frequency. An arbitrary (time) interval of
length L consequently shows Lλ cycles of a periodic function f with
fundamental frequency λ. Popular examples of periodic functions
are sine and cosine, which both have the fundamental period P =
2π. Their fundamental frequency, therefore, is λ = 1/(2π). The
predominant family of periodic functions within time series analysis
are the harmonic components
m(t) := A cos(2πλt) + B sin(2πλt), A, B ∈ R, λ > 0,
which have period 1/λ and frequency λ. A linear combination of

harmonic components
r
X
g(t) := µ + Ak cos(2πλk t) + Bk sin(2πλk t) , µ ∈ R,
k=1
will be named a harmonic wave of length r.

Example 4.1.1. (Star Data). To analyze physical properties of a pul-
sating star, the intensity of light emitted by this pulsar was recorded
at midnight during 600 consecutive nights. The data are taken from
Newton (1988). It turns out that a harmonic wave of length two fits
the data quite well. The following figure displays the first 160 data
yt and the sum of two harmonic components with period 24 and 29,
respectively, plus a constant term µ = 17.07 fitted to these data, i.e.,
ỹt = 17.07 − 1.86 cos(2π(1/24)t) + 6.82 sin(2π(1/24)t)

+ 6.09 cos(2π(1/29)t) + 8.01 sin(2π(1/29)t).
The derivation of these particular frequencies and coefficients will be

the content of this section and the following ones. For easier access
we begin with the case of known frequencies but unknown constants.
4.1 Least Squares Approach with Known Frequencies 137
Plot 4.1.1: Intensity of light emitted by a pulsating star and a fitted

harmonic wave.
Model: MODEL1
Dependent Variable: LUMEN
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 4 48400 12100 49297.2 <.0001

Error 595 146.04384 0.24545
C Total 599 48546
Root MSE 0.49543 R-square 0.9970

Dep Mean 17.09667 Adj R-sq 0.9970
C.V. 2.89782
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Prob > |T|
Intercept 1 17.06903 0.02023 843.78 <.0001

sin24 1 6.81736 0.02867 237.81 <.0001
cos24 1 -1.85779 0.02865 -64.85 <.0001
sin29 1 8.01416 0.02868 279.47 <.0001
cos29 1 6.08905 0.02865 212.57 <.0001
Listing 4.1.1b: Regression results of fitting a harmonic wave.
1 /* star_harmonic.sas */
2 TITLE1 ’Harmonic wave’;
3 TITLE2 ’Star Data’;
4
5 /* Read in the data and compute harmonic waves to which data are to be
,→ fitted */
6 DATA data1;
7 INFILE ’c:\data\star.txt’;
8 INPUT lumen @@;
9 t=_N_;
10 pi=CONSTANT(’PI’);
11 sin24=SIN(2*pi*t/24);
12 cos24=COS(2*pi*t/24);
13 sin29=SIN(2*pi*t/29);
14 cos29=COS(2*pi*t/29);
15
16 /* Compute a regression */
17 PROC REG DATA=data1;
18 MODEL lumen=sin24 cos24 sin29 cos29;
19 OUTPUT OUT=regdat P=predi;
20
22 SYMBOL1 C=GREEN V=DOT I=NONE H=.4;
23 SYMBOL2 C=RED V=NONE I=JOIN;
24 AXIS1 LABEL=(ANGLE=90 ’lumen’);
26
27 /* Plot data and fitted harmonic wave */
28 PROC GPLOT DATA=regdat(OBS=160);
29 PLOT lumen*t=1 predi*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
30 RUN; QUIT;
The number π is generated by the SAS func- are on the right side. A temporary data file
tion CONSTANT with the argument ’PI’. It is named regdat is generated by the OUTPUT
then stored in the variable pi. This is used statement. It contains the original variables
to define the variables cos24, sin24, cos29 of the source data step and the values pre-
and sin29 for the harmonic components. The dicted by the regression for lumen in the vari-
other variables here are lumen read from an able predi.
external file and t generated by N .
The PROC REG statement causes SAS to make The last part of the program creates a plot of
a regression from the independent variable the observed lumen values and a curve of the
lumen defined on the left side of the MODEL predicted values restricted on the first 160 ob-
statement on the harmonic components which servations.
The output of Program 4.1.1 (star harmonic.sas) is the standard text

output of a regression with an ANOVA table and parameter estimates.
For further information on how to read the output, we refer to Falk
et al. (2002, Chapter 3).
In a first step we will fit a harmonic component with fixed frequency
λ to mean value adjusted data yt − ȳ, t = 1, . . . , n. To this end, we
put with arbitrary A, B ∈ R
m(t) = Am1 (t) + Bm2 (t),
where
m1 (t) := cos(2πλt), m2 (t) = sin(2πλt).
In order to get a proper fit uniformly over all t, it is reasonable to
choose the constants A and B as minimizers of the residual sum of
squares
Xn
R(A, B) := (yt − ȳ − m(t))2 .
t=1
Taking partial derivatives of the function R with respect to A and B
and equating them to zero, we obtain that the minimizing pair A, B
has to satisfy the normal equations
n
X
Ac11 + Bc12 = (yt − ȳ) cos(2πλt)
t=1
Xn
Ac21 + Bc22 = (yt − ȳ) sin(2πλt),
t=1
where n
X
cij = mi (t)mj (t).
t=1
If c11 c22 − c12 c21 6= 0, the uniquely determined pair of solutions A, B
of these equations is
c22 C(λ) − c12 S(λ)
A = A(λ) = n
c11 c22 − c12 c21
c21 C(λ) − c11 S(λ)
B = B(λ) = n ,
c12 c21 − c11 c22
where
n
1X
C(λ) := (yt − ȳ) cos(2πλt),
n t=1
n
1X
S(λ) := (yt − ȳ) sin(2πλt) (4.1)
n t=1
are the empirical (cross-)covariances of (yt )1≤t≤n and (cos(2πλt))1≤t≤n

and of (yt )1≤t≤n and (sin(2πλt))1≤t≤n , respectively. As we will see,
these cross-covariances C(λ) and S(λ) are fundamental to the analysis
of a time series in the frequency domain.
The solutions A and B become particularly simple in the case of
Fourier frequencies λ = k/n, k = 0, 1, 2, . . . , [n/2], where [x] denotes
the greatest integer less than or equal to x ∈ R. If k 6= 0 and k 6= n/2
in the case of an even sample size n, then we obtain from Lemma 4.1.2
below that c12 = c21 = 0 and c11 = c22 = n/2 and thus
A = 2C(λ), B = 2S(λ).
Harmonic Waves with Fourier Frequencies

Next we will fit harmonic waves to data y1 , . . . , yn , where we restrict
ourselves to Fourier frequencies, which facilitates the computations.
The following lemma will be crucial.
Lemma 4.1.2. For arbitrary 0 ≤ k, m ≤ [n/2]; k, m ∈ N we have

X n k m  n, k = m = 0 or = n/2
cos 2π t cos 2π t = n/2, k = m 6= 0 and =
6 n/2
t=1
n n 
0, k 6= m

X n k m  0, k = m = 0 or n/2
sin 2π t sin 2π t = n/2, k = m 6= 0 and =
6 n/2
t=1
n n 
0, k 6= m
X n k m
cos 2π t sin 2π t = 0.
t=1
n n
Proof. Exercise 4.3.

The above lemma implies that the 2[n/2] + 1 vectors in Rn
(sin(2π(k/n)t))1≤t≤n , k = 1, . . . , [n/2],
and
(cos(2π(k/n)t))1≤t≤n , k = 0, . . . , [n/2],
span the space Rn . Note that by Lemma 4.1.2 in the case of n odd
the above 2[n/2] + 1 = n vectors are linearly independent, precisely,
they are orthogonal, whereas in the case of an even sample size n the
vector (sin(2π(k/n)t))1≤t≤n with k = n/2 is the null vector (0, . . . , 0)
and the remaining n vectors are again orthogonal. As a consequence
we obtain that for a given set of data y1 , . . . , yn , there exist in any
case uniquely determined coefficients Ak and Bk , k = 0, . . . , [n/2],
with B0 := 0 and, if n is even, Bn/2 = 0 as well, such that
[n/2] k k
X
yt = Ak cos 2π t + Bk sin 2π t , t = 1, . . . , n. (4.2)
n n
k=0
Next we determine these coefficients Ak , Bk . Let to this end v1 , . . . , vn

be arbitrary orthogonal vectors in Rn , precisely, viT vj = 0 if i 6= j and
= ||vi ||2 > 0 if i = j. Then v1 , . . . , vn span Rn and, thus, for any
y ∈ Rn there exists real numbers c1 , . . . , cn such that
n
X
y= c k vk .
k=1
As a consequence we obtain for i = 1, . . . , n

n
X
T
y vi = ck vkT vi = ck ||vi ||2
k=1
and, thus,
y T vi
ck = , k = 1, . . . , n.
||vi ||2
Applying this to equation (4.2) we obtain from Lemma 4.1.2

 P
 2 n yt cos 2π k t , k = 1, . . . , [(n − 1)/2]
n t=1
Ak = n
 1 n yt cos 2π k t , k = 0 and k = n/2, if n is even
P
n t=1 n
n
2X k
Bk = yt sin 2π t , k = 1, . . . , [(n − 1)/2]. (4.3)
n t=1 n
A popular equivalent formulation of (4.2) is

[n/2] k
Ã0 X k
yt = + Ak cos 2π t + Bk sin 2π t , t = 1, . . . , n,
2 n n
k=1
(4.4)
with Ak , Bk as in (4.3) for k = 1, . . . , [n/2], Bn/2 = 0 for an even n,
and n
2X
Ã0 = 2A0 = yt = 2ȳ.
n t=1
Up to the factor 2, the coefficients Ak , Bk coincide with the empirical
covariances C(k/n) and S(k/n), k = 1, . . . , [(n − 1)/2], defined in
(4.1). This follows from the equations (Exercise 4.2)
n n
X k X k
cos 2π t = sin 2π t = 0, k = 1, . . . , [n/2]. (4.5)
t=1
n t=1
n
4.2 The Periodogram

In the preceding section we exactly fitted a harmonic wave with Fou-
rier frequencies λk = k/n, k = 0, . . . , [n/2], to a given series y1 , . . . , yn .
Example 4.1.1 shows that a harmonic wave including only two frequen-
cies already fits the Star Data quite well. There is a general tendency
that a time series is governed by different frequencies λ1 , . . . , λr with
r < [n/2], which on their part can be approximated by Fourier fre-
quencies k1 /n, . . . , kr /n if n is sufficiently large. The question which
frequencies actually govern a time series leads to the intensity of a
frequency λ. This number reflects the influence of the harmonic com-
ponent with frequency λ on the series. The intensity of the Fourier
4.2 The Periodogram 143
frequency λ = k/n, 1 ≤ k ≤ [(n − 1)/2], is defined via its residual

sum of squares. We have by Lemma 4.1.2, (4.5) and (4.4)
Xn k k 2
yt − ȳ − Ak cos 2π t − Bk sin 2π t
t=1
n n
n
X n 2
(yt − ȳ)2 − Ak + Bk2 ,

= k = 1, . . . , [(n − 1)/2],
t=1
2
and
n [n/2]
X
2 nX 2
Ak + Bk2 .

(yt − ȳ) =
t=1
2
k=1
The number (n/2)(A2k +Bk2 ) = 2n(C 2 (k/n)+S 2 (k/n)) is therefore the

contribution of the harmonic component with PnFourier frequency k/n,
2
k = 1, . . . , [(n − 1)/2], to the total variation t=1 (yt − ȳ) . It is called
the intensity of the frequency k/n. Further insight is gained from the
Fourier analysis in Theorem 4.2.4. For general frequencies λ ∈ R we
define its intensity now by
I(λ) = n C(λ)2 + S(λ)2

n n
!
1 X 2 X 2
= (yt − ȳ) cos(2πλt) + (yt − ȳ) sin(2πλt) .
n t=1 t=1
(4.6)
This function is called the periodogram. The following Theorem im-

plies in particular that it is sufficient to define the periodogram on the
interval [0, 1]. For Fourier frequencies we obtain from (4.3) and (4.5)
n 2
Ak + Bk2 ,

I(k/n) = k = 1, . . . , [(n − 1)/2].
4
Theorem 4.2.1. We have
(i) I(0) = 0,
(ii) I is an even function, i.e., I(λ) = I(−λ) for any λ ∈ R,
(iii) I has the period 1.

Proof. Part (i) follows from sin(0) = 0 and cos(0) = 1, while (ii) is a
consequence of cos(−x) = cos(x), sin(−x) = − sin(x), x ∈ R. Part
(iii) follows from the fact that 2π is the fundamental period of sin and
cos.
Theorem 4.2.1 implies that the function I(λ) is completely determined
by its values on [0, 0.5]. The periodogram is often defined on the scale
[0, 2π] instead of [0, 1] by putting I ∗ (ω) := 2I(ω/(2π)); this version is,
for example, used in SAS. In view of Theorem 4.2.4 below we prefer
I(λ), however.
The following figure displays the periodogram of the Star Data from
Example 4.1.1. It has two obvious peaks at the Fourier frequencies
21/600 = 0.035 ≈ 1/28.57 and 25/600 = 1/24 ≈ 0.04167. This
indicates that essentially two cycles with period 24 and 28 or 29 are
inherent in the data. A least squares approach for the determination of
the coefficients Ai , Bi , i = 1, 2 with frequencies 1/24 and 1/29 as done
in Program 4.1.1 (star harmonic.sas) then leads to the coefficients in
Example 4.1.1.
Plot 4.2.1: Periodogram of the Star Data.

--------------------------------------------------------------
COS_01
34.1933
--------------------------------------------------------------
PERIOD COS_01 SIN_01 P LAMBDA
28.5714 -0.91071 8.54977 11089.19 0.035000

24.0000 -0.06291 7.73396 8972.71 0.041667
30.0000 0.42338 -3.76062 2148.22 0.033333
27.2727 -0.16333 2.09324 661.25 0.036667
31.5789 0.20493 -1.52404 354.71 0.031667
26.0870 -0.05822 1.18946 212.73 0.038333
Listing 4.2.1b: The constant Ã0 = 2A0 = 2ȳ and the six Fourier
frequencies λ = k/n with largest I(k/n)-values, their inverses and the
Fourier coefficients pertaining to the Star Data.
1 /* star_periodogram.sas */
2 TITLE1 ’Periodogram’;
3 TITLE2 ’Star Data’;
4
6 DATA data1;
7 INFILE ’c:\data\star.txt’;
8 INPUT lumen @@;
9
10 /* Compute the periodogram */
11 PROC SPECTRA DATA=data1 COEF P OUT=data2;
12 VAR lumen;
13
14 /* Adjusting different periodogram definitions */
15 DATA data3;
16 SET data2(FIRSTOBS=2);
17 p=P_01/2;
18 lambda=FREQ/(2*CONSTANT(’PI’));
19 DROP P_01 FREQ;
20
22 SYMBOL1 V=NONE C=GREEN I=JOIN;
23 AXIS1 LABEL=(’I(’ F=CGREEK ’l)’);
24 AXIS2 LABEL=(F=CGREEK ’l’);
25
26 /* Plot the periodogram */
27 PROC GPLOT DATA=data3(OBS=50);
28 PLOT p*lambda=1 / VAXIS=AXIS1 HAXIS=AXIS2;
29
30 /* Sort by periodogram values */
31 PROC SORT DATA=data3 OUT=data4;
32 BY DESCENDING p;
33
34 /* Print largest periodogram values */
35 PROC PRINT DATA=data2(OBS=1) NOOBS;

36 VAR COS_01;
38 RUN; QUIT;
The first step is to read the star data from and the no more needed ones are dropped. By
an external file into a data set. Using the means of the data set option FIRSTOBS=2 the
SAS procedure SPECTRA with the options first observation of data2 is excluded from the
P (periodogram), COEF (Fourier coefficients), resulting data set.
OUT=data2 and the VAR statement specifying
The following PROC GPLOT just takes the first
the variable of interest, an output data set is
50 observations of data3 into account. This
generated. It contains periodogram data P 01
means a restriction of lambda up to 50/600 =
evaluated at the Fourier frequencies, a FREQ
1/12, the part with the greatest peaks in the pe-
variable for this frequencies, the pertaining pe-
riodogram.
riod in the variable PERIOD and the variables
COS 01 and SIN 01 with the coefficients for The procedure SORT generates a new data set
the harmonic waves. Because SAS uses differ- data4 containing the same observations as
ent definitions for the frequencies and the peri- the input data set data3, but they are sorted
odogram, here in data3 new variables lambda in descending order of the periodogram values.
(dividing FREQ by 2π to eliminate an additional The two PROC PRINT statements at the end
factor 2π) and p (dividing P 01 by 2) are created make SAS print to datasets data2 and data4.
The first part of the output is the coefficient Ã0 which is equal to
two times the mean of the lumen data. The results for the Fourier
frequencies with the six greatest periodogram values constitute the
second part of the output. Note that the meaning of COS 01 and
SIN 01 are slightly different from the definitions of Ak and Bk in
(4.3), because SAS lets the index run from 0 to n − 1 instead of 1 to
n.
The Fourier Transform

From Euler’s equation eiz = cos(z) + i sin(z), z ∈ R, we obtain for
λ∈R
n
1X
D(λ) := C(λ) − iS(λ) = (yt − ȳ)e−i2πλt .
n t=1
The periodogram is a function of D(λ), since I(λ) = n|D(λ)|2 . Unlike
the periodogram, the number D(λ) contains the complete information
about C(λ) and S(λ), since both values can be recovered from the
complex number D(λ), being its real and negative imaginary part. In
the following we view the data y1 , . . . , yn again as a clipping from an
infinite series yt , t ∈ Z. Let a := (at )t∈Z be an absolutely summable

sequence of real numbers. For such a sequence a the complex valued
function X
fa (λ) = at e−i2πλt , λ ∈ R,
t∈Z
is said to be its Fourier transform. It links the empirical autoco-
variance function to the periodogram as it will turn out in Theorem
4.2.3
P that−i2πλt the latter
P is the Fourier transform of the first. Note that
ix
t∈Z |at e | = t∈Z |at | < ∞, since |e | = 1 for any x ∈ R. The
Fourier transform of at = (yt − ȳ)/n, t = 1, . . . , n, and at = 0 else-
where is then given by D(λ). The following elementary properties of
the Fourier transform are immediate consequences of the arguments
in the proof of Theorem 4.2.1. In particular we obtain that the Fourier
transform is already determined by its values on [0, 0.5].
Theorem 4.2.2. We have

P
(i) fa (0) = t∈Z at ,
(ii) fa (−λ) and fa (λ) are conjugate complex numbers i.e., fa (−λ) =
fa (λ),
(iii) fa has the period 1.
Autocorrelation Function and Periodogram

Information about cycles that are inherent in given data, can also be
deduced from the empirical autocorrelation function. The following
figure displays the autocorrelation function of the Bankruptcy Data,
introduced in Exercise 1.20.
Plot 4.2.2: Autocorrelation function of the Bankruptcy Data.
1 /* bankruptcy_correlogram.sas */
2 TITLE1 ’Correlogram’;
3 TITLE2 ’Bankruptcy Data’;
4
6 DATA data1;
7 INFILE ’c:\data\bankrupt.txt’;
8 INPUT year bankrupt;
9
10 /* Compute autocorrelation function */
12 IDENTIFY VAR=bankrupt NLAG=64 OUTCOV=corr NOPRINT;
13
15 AXIS1 LABEL=(’r(k)’);
16 AXIS2 LABEL=(’k’);
18
19 /* Plot auto correlation function */
21 PLOT CORR*LAG / VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
22 RUN; QUIT;
After reading the data from an external file stores them into a new data set. The correlo-
into a data step, the procedure ARIMA calcu- gram is generated using PROC GPLOT.
lates the empirical autocorrelation function and
The next figure displays the periodogram of the Bankruptcy Data.
Plot 4.2.3: Periodogram of the Bankruptcy Data.
1 /* bankruptcy_periodogram.sas */
2 TITLE1 ’Periodogram’;
3 TITLE2 ’Bankruptcy Data’;
4
6 DATA data1;
7 INFILE ’c:\data\bankrupt.txt’;
8 INPUT year bankrupt;
9
10 /* Compute the periodogram */
11 PROC SPECTRA DATA=data1 P OUT=data2;
12 VAR bankrupt;
13
15 DATA data3;
17 p=P_01/2;
19
22 AXIS1 LABEL=(’I’ F=CGREEK ’(l)’) ;
23 AXIS2 ORDER=(0 TO 0.5 BY 0.05) LABEL=(F=CGREEK ’l’);
24
25 /* Plot the periodogram */
27 PLOT p*lambda / VAXIS=AXIS1 HAXIS=AXIS2;
28 RUN; QUIT;
This program again first reads the data formations of the periodogram and the fre-
and then starts a spectral analysis by quency values generated by PROC SPECTRA
PROC SPECTRA. Due to the reasons men- done in data3. The graph results from the
tioned in the comments to Program 4.2.1 statements in PROC GPLOT.
(star periodogram.sas) there are some trans-
The autocorrelation function of the Bankruptcy Data has extreme

values at about multiples of 9 years, which indicates a period of length
9. This is underlined by the periodogram in Plot 4.2.3, which has a
peak at λ = 0.11, corresponding to a period of 1/0.11 ∼ 9 years
as well. As mentioned above, there is actually a close relationship
between the empirical autocovariance function and the periodogram.
The corresponding result for the theoretical autocovariances is given
in Chapter 5.
Theorem 4.2.3. Denote by Pcn−k

the empirical autocovariance function
1
of y1 , . . . , yn , i.e., c(k) = n j=1 (yj − ȳ)(yj+k − ȳ), k = 0, . . . , n − 1,
where ȳ := n−1 nj=1 yj . Then we have with c(−k) := c(k)
P
n−1
X
I(λ) = c(0) + 2 c(k) cos(2πλk)
k=1
n−1
X
= c(k)e−i2πλk .
k=−(n−1)
Proof. From the equation cos(x1 ) cos(x2 ) + sin(x1 ) sin(x2 ) = cos(x1 −

x2 ) for x1 , x2 ∈ R we obtain
n n
1 XX
I(λ) = (ys − ȳ)(yt − ȳ)
n s=1 t=1

× cos(2πλs) cos(2πλt) + sin(2πλs) sin(2πλt)
n n
1 XX
= ast ,
n s=1 t=1
where ast := (ys − ȳ)(yt − ȳ) cos(2πλ(s − t)). Since ast = ats and
cos(0) = 1 we have moreover
n n−1 n−k
1X 2 XX
I(λ) = att + ajj+k
n t=1 n j=1 k=1
n n−1 Xn−k
1X 2
X 1
= (yt − ȳ) + 2 (yj − ȳ)(yj+k − ȳ) cos(2πλk)
n t=1 n j=1
k=1
n−1
X
= c(0) + 2 c(k) cos(2πλk).
k=1
The complex representation of the periodogram is then obvious:

n−1
X n−1
X
−i2πλk
c(k) ei2πλk + e−i2πλk

c(k)e = c(0) +
k=−(n−1) k=1
n−1
X
= c(0) + c(k)2 cos(2πλk) = I(λ).
k=1
Inverse Fourier Transform

The empirical autocovariance function can be recovered from the pe-
riodogram, which is the content of our next result. Since the pe-
riodogram is the Fourier transform of the empirical autocovariance
function, this result is a special case of the inverse Fourier transform
in Theorem 4.2.5 below.
Theorem 4.2.4. The periodogram

n−1
X
I(λ) = c(k)e−i2πλk , λ ∈ R,
k=−(n−1)
satisfies the inverse formula

Z 1
c(k) = I(λ)ei2πλk dλ, |k| ≤ n − 1.
0
In particular for k = 0 we obtain

Z 1
c(0) = I(λ) dλ.
0
1
Pn
The sample variance c(0) = n − ȳ)2 equals, therefore, the
j=1 (yj
Rλ
area under the curve I(λ), 0 ≤ λ ≤ 1. The integral λ12 I(λ) dλ
can be interpreted as that portion of the total variance c(0), which
is contributed by the harmonic waves with frequencies λ ∈ [λ1 , λ2 ],
where 0 ≤ λ1 < λ2 ≤ 1. The periodogram consequently shows the
distribution of the total variance among the frequencies λ ∈ [0, 1]. A
peak of the periodogram at a frequency λ0 implies, therefore, that a
large part of the total variation c(0) of the data can be explained by
the harmonic wave with that frequency λ0 .
The following result is the inverse formula for general Fourier trans-
forms.
Theorem 4.2.5. For an absolutely P summable sequence a := (at )t∈Z
−i2πλt
with Fourier transform fa (λ) = t∈Z at e , λ ∈ R, we have
Z 1
at = fa (λ)ei2πλt dλ, t ∈ Z.
0
Proof. The dominated convergence theorem implies

Z 1 Z 1X
i2πλt −i2πλs
fa (λ)e dλ = as e ei2πλt dλ
0 0 s∈Z
X Z 1
= as ei2πλ(t−s) dλ = at ,
s∈Z 0
since (
Z 1
1, if s = t
ei2πλ(t−s) dλ =
0 0, if s 6= t.
The inverse Fourier transformation shows that the complete sequence

(at )t∈Z can be recovered from its Fourier transform. This implies
in particular that the Fourier transforms of absolutely summable se-
quences are uniquely determined. The analysis of a time series in
the frequency domain is, therefore, equivalent to its analysis in the
time domain, which is based on an evaluation of its autocovariance
function.
Aliasing
Suppose that we observe a continuous time process (Zt )t∈R only through
its values at k∆, k ∈ Z, where ∆ > 0 is the sampling interval,
i.e., we actually observe (Yk )k∈Z = (Zk∆ )k∈Z . Take, for example,
Zt := cos(2π(9/11)t), t ∈ R. The following figure shows that at
k ∈ Z, i.e., ∆ = 1, the observations Zk coincide with Xk , where
Xt := cos(2π(2/11)t), t ∈ R. With the sampling interval ∆ = 1,
the observations Zk with high frequency 9/11 can, therefore, not be
distinguished from the Xk , which have low frequency 2/11.
Plot 4.2.4: Aliasing of cos(2π(9/11)k) and cos(2π(2/11)k).
1 /* aliasing.sas */
2 TITLE1 ’Aliasing’;
3
4 /* Generate harmonic waves */
5 DATA data1;
6 DO t=1 TO 14 BY .01;
7 y1=COS(2*CONSTANT(’PI’)*2/11*t);
8 y2=COS(2*CONSTANT(’PI’)*9/11*t);
9 OUTPUT;
10 END;
11
12 /* Generate points of intersection */
13 DATA data2;
14 DO t0=1 TO 14;
15 y0=COS(2*CONSTANT(’PI’)*2/11*t0);
16 OUTPUT;
17 END;
18
19 /* Merge the data sets */
20 DATA data3;
22
24 SYMBOL1 V=DOT C=GREEN I=NONE H=.8;
25 SYMBOL2 V=NONE C=RED I=JOIN;
Exercises 155
28
29 /* Plot the curves with point of intersection */
31 PLOT y0*t0=1 y1*t=2 y2*t=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
32 RUN; QUIT;
In the first data step a tight grid for the cosine After merging the two data sets the two waves
waves with frequencies 2/11 and 9/11 is gen- are plotted using the JOIN option in the
erated. In the second data step the values of SYMBOL statement while the values at the ob-
the cosine wave with frequency 2/11 are gen- servation points are displayed in the same
erated just for integer values of t symbolizing graph by dot symbols.
the observation points.
This phenomenon that a high frequency component takes on the val-

ues of a lower one, is called aliasing. It is caused by the choice of the
sampling interval ∆, which is 1 in Plot 4.2.4. If a series is not just a
constant, then the shortest observable period is 2∆. The highest ob-
servable frequency λ∗ with period 1/λ∗ , therefore, satisfies 1/λ∗ ≥ 2∆,
i.e., λ∗ ≤ 1/(2∆). This frequency 1/(2∆) is known as the Nyquist fre-
quency. The sampling interval ∆ should, therefore, be chosen small
enough, so that 1/(2∆) is above the frequencies under study. If, for
example, a process is a harmonic wave with frequencies λ1 , . . . , λp ,
then ∆ should be chosen such that λi ≤ 1/(2∆), 1 ≤ i ≤ p, in or-
der to visualize p different frequencies. For the periodic curves in
Plot 4.2.4 this means to choose 9/11 ≤ 1/(2∆) or ∆ ≤ 11/18.
Exercises
4.1. Let y(t) = A cos(2πλt) + B sin(2πλt) be a harmonic component.
Show that y can be written as y(t) = α cos(2πλt − ϕ), where α is the
amplitiude, i.e., the maximum departure of the wave from zero and ϕ
is the phase displacement.
4.2. Show that
n
(
X n, λ∈Z
cos(2πλt) =
t=1
cos(πλ(n + 1)) sin(πλn)
sin(πλ) , λ 6∈ Z
n
(
X 0, λ∈Z
sin(2πλt) =
t=1
sin(πλ(n + 1)) sin(πλn)
sin(πλ) , λ 6∈ Z.
Hint: Compute nt=1 ei2πλt , where eiϕ = cos(ϕ) + i sin(ϕ) is the com-
P
plex valued exponential function.
4.3. Verify Lemma 4.1.2. Hint: Exercise 4.2.
4.4. Suppose that the time series (yt )t satisfies the additive model
with seasonal component
s
X k X s k
s(t) = Ak cos 2π t + Bk sin 2π t .
s s
k=1 k=1
Show that s(t) is eliminated by the seasonal differencing ∆s yt = yt −

yt−s .
4.5. Fit a harmonic component with frequency λ to a time series
y1 , . . . , yN , where λ ∈ Z and λ − 0.5 ∈ Z. Compute the least squares
estimator and the pertaining residual sum of squares.
4.6. Put yt = t, t = 1, . . . , n. Show that
n
I(k/n) = , k = 1, . . . , [(n − 1)/2].
4 sin2 (πk/n)
Hint: Use the equations
n−1
X sin(nθ) n cos((n − 0.5)θ)
t sin(θt) = −
t=1
4 sin2 (θ/2) 2 sin(θ/2)
n−1
X n sin((n − 0.5)θ) 1 − cos(nθ)
t cos(θt) = − .
t=1
2 sin(θ/2) 4 sin2 (θ/2)
4.7. (Unemployed1 Data) Plot the periodogram of the first order dif-
ferences of the numbers of unemployed in the building trade as intro-
duced in Example 1.1.1.
4.8. (Airline Data) Plot the periodogram of the variance stabilized
and trend adjusted Airline Data, introduced in Example 1.3.1. Add
a seasonal adjustment and compare the periodograms.
4.9. The contribution of the autocovariance c(k), k ≥ 1, to the pe-
riodogram can be illustrated by plotting the functions ± cos(2πλk),
λ ∈ [0.5].
Exercises 157
(i) Which conclusion about the intensities of large or small frequen-

cies can be drawn from a positive value c(1) > 0 or a negative
one c(1) < 0?
(ii) Which effect has an increase of |c(2)| if all other parameters
remain unaltered?
(iii) What can you say about the effect of an increase of c(k) on the
periodogram at the values 40, 1/k, 2/k, 3/k, . . . and the inter-
mediate values 1/2k, 3/(2k), 5/(2k)? Illustrate the effect at a
time series with seasonal component k = 12.
4.10. Establish a version of the inverse Fourier transform in real
terms.
4.11. Let a = (at )t∈Z and b = (bt )t∈Z be absolute summable se-
quences.
(i) Show that for αa + βb := (αat + βbt )t∈Z , α, β ∈ R,
fαa+βb (λ) = αfa (λ) + βfb (λ).
(ii) For ab := (at bt )t∈Z we have

Z 1
fab (λ) = fa ∗ fb (λ) := fa (µ)fb (λ − µ) dµ.
0
P
(iii) Show that for a ∗ b := ( s∈Z as bt−s )t∈Z (convolution)
fa∗b (λ) = fa (λ)fb (λ).
4.12. (Fast Fourier Transform (FFT)) The Fourier transform of a fi-

nite sequence a0 , a1 , . . . , aN −1 can be represented under suitable con-
ditions as the composition of Fourier transforms. Put
N
X −1
f (s/N ) = at e−i2πst/N , s = 0, . . . , N − 1,
t=0
which is the Fourier transform of length N . Suppose that N = KM

with K, M ∈ N. Show that f can be represented as Fourier transform
of length K, computed for a Fourier transform of length M .

Hint: Each t, s ∈ {0, . . . , N − 1} can uniquely be written as
t = t0 + t1 K, t0 ∈ {0, . . . , K − 1}, t1 ∈ {0, . . . , M − 1}

s = s0 + s1 M, s0 ∈ {0, . . . , M − 1}, s1 ∈ {0, . . . , K − 1}.
Sum over t0 and t1 .
4.13. (Star Data) Suppose that the Star Data are only observed
weekly (i.e., keep only every seventh observation). Is an aliasing effect
observable?
Chapter
The Spectrum of a
Stationary Process
In this chapter we investigate the spectrum of a real valued station-
5
ary process, which is the Fourier transform of its (theoretical) auto-
covariance function. Its empirical counterpart, the periodogram, was
investigated in the preceding sections, cf. Theorem 4.2.3.
Let (Yt )t∈Z be a (real valued) stationary process with an absolutely
summable autocovariance function γ(t), t ∈ Z. Its Fourier transform
X X
−i2πλt
f (λ) := γ(t)e = γ(0) + 2 γ(t) cos(2πλt), λ ∈ R,
t∈Z t∈N
is called spectral density or spectrum of the process (Yt )t∈Z . By the

inverse Fourier transform in Theorem 4.2.5 we have
Z 1 Z 1
i2πλt
γ(t) = f (λ)e dλ = f (λ) cos(2πλt) dλ.
0 0
For t = 0 we obtain Z 1
γ(0) = f (λ) dλ,
0
which shows that the spectrum is a decomposition of the variance
γ(0). In Section 5.3 we will in particular compute the spectrum of
an ARMA-process. As a preparatory step we investigate properties
of spectra for arbitrary absolutely summable filters.
160 The Spectrum of a Stationary Process
5.1 Characterizations of Autocovariance

Functions
Recall that the autocovariance function γ : Z → R of a stationary
process (Yt )t∈Z is given by
γ(h) = E(Yt+h Yt ) − E(Yt+h ) E(Yt ), h ∈ Z,
with the properties
γ(0) ≥ 0, |γ(h)| ≤ γ(0), γ(h) = γ(−h), h ∈ Z. (5.1)
The following result characterizes an autocovariance function in terms

of positive semidefiniteness.
Theorem 5.1.1. A symmetric function K : Z → R is the autoco-

variance function of a stationary process (Yt )t∈Z iff K is a positive
semidefinite function, i.e., K(−n) = K(n) and
X
xr K(r − s)xs ≥ 0 (5.2)
1≤r,s≤n
for arbitrary n ∈ N and x1 , . . . , xn ∈ R.
Proof. It is easy to see that (5.2) is a necessary condition for K to be

the autocovariance function of a stationary process, see Exercise 5.19.
It remains to show that (5.2) is sufficient, i.e., we will construct a
stationary process, whose autocovariance function is K.
We will define a family of finite-dimensional normal distributions,
which satisfies the consistency condition of Kolmogorov’s theorem, cf.
Brockwell and Davis (1991, Theorem 1.2.1). This result implies the
existence of a process (Vt )t∈Z , whose finite dimensional distributions
coincide with the given family.
Define the n × n-matrix
K (n) := K(r − s) 1≤r,s≤n ,

which is positive semidefinite. Consequently there exists an n-dimen-

sional normal distributed random vector (V1 , . . . , Vn ) with mean vector
5.1 Characterizations of Autocovariance Functions 161
zero and covariance matrix K (n) . Define now for each n ∈ N and t ∈ Z
a distribution function on Rn by
Ft+1,...,t+n (v1 , . . . , vn ) := P {V1 ≤ v1 , . . . , Vn ≤ vn }.
This defines a family of distribution functions indexed by consecutive
integers. Let now t1 < · · · < tm be arbitrary integers. Choose t ∈ Z
and n ∈ N such that ti = t + ni , where 1 ≤ n1 < · · · < nm ≤ n. We
define now
Ft1 ,...,tm ((vi )1≤i≤m ) := P {Vni ≤ vi , 1 ≤ i ≤ m}.
Note that Ft1 ,...,tm does not depend on the special choice of t and n
and thus, we have defined a family of distribution functions indexed
by t1 < · · · < tm on Rm for each m ∈ N, which obviously satisfies the
consistency condition of Kolmogorov’s theorem. This result implies
the existence of a process (Vt )t∈Z , whose finite dimensional distribution
at t1 < · · · < tm has distribution function Ft1 ,...,tm . This process
has, therefore, mean vector zero and covariances E(Vt+h Vt ) = K(h),
h ∈ Z.
Spectral Distribution Function and Spectral Density

The preceding result provides a characterization of an autocovariance
function in terms of positive semidefiniteness. The following char-
acterization of positive semidefinite functions isR known as Herglotz’s
1
theorem.
R We use in the following the notation 0 g(λ) dF (λ) in place
of (0,1] g(λ) dF (λ).
Theorem 5.1.2. A symmetric function γ : Z → R is positive semidef-
inite iff it can be represented as an integral
Z 1 Z 1
i2πλh
γ(h) = e dF (λ) = cos(2πλh) dF (λ), h ∈ Z, (5.3)
0 0
where F is a real valued measure generating function on [0, 1] with

F (0) = 0. The function F is uniquely determined.
The uniquely determined function F , which is a right-continuous, in-
creasing and bounded function, is called the spectral distribution func-
tion of γ. If F has a derivative f and, thus, F (λ) = F (λ) − F (0) =
Rλ
0 f (x) dx for 0 ≤ λ ≤P1, then f is called the spectral density of γ.
Note that the property h≥0 |γ(h)| < ∞ already implies the existence
of a spectral density of γ, cf. Theorem 4.2.5 and the proof of Corollary
5.1.5. R1
Recall that γ(0) = 0 dF (λ) = F (1) and thus, the autocorrelation
function ρ(h) = γ(h)/γ(0) has the above integral representation, but
with F replaced by the probability distribution function F/γ(0).
Proof of Theorem 5.1.2. We establish first the uniqueness of F . Let
G be another measure generating function with G(λ) = 0 for λ ≤ 0
and constant for λ ≥ 1 such that
Z 1 Z 1
i2πλh
γ(h) = e dF (λ) = ei2πλh dG(λ), h ∈ Z.
0 0
Let now ψ be a continuous function on [0, 1]. From calculus we know

(cf. Rudin, 1986, Section 4.24) that we can find for arbitrary ε > 0
a trigonometric polynomial pε (λ) = N i2πλh
P
h=−N ah e , 0 ≤ λ ≤ 1, such
that
sup |ψ(λ) − pε (λ)| ≤ ε.
0≤λ≤1
As a consequence we obtain that

Z 1 Z 1
ψ(λ) dF (λ) = pε (λ) dF (λ) + r1 (ε)
0 0
Z 1
= pε (λ) dG(λ) + r1 (ε)
0
Z 1
= ψ(λ) dG(λ) + r2 (ε),
0
where ri (ε) → 0 as ε → 0, i = 1, 2, and, thus,

Z 1 Z 1
ψ(λ) dF (λ) = ψ(λ) dG(λ).
0 0
Since ψ was an arbitrary continuous function, this in turn together

with F (0) = G(0) = 0 implies F = G.
Suppose now that γ has the representation (5.3). We have for arbi-
trary xi ∈ R, i = 1, . . . , n
X Z 1 X
xr γ(r − s)xs = xr xs ei2πλ(r−s) dF (λ)
1≤r,s≤n 0 1≤r,s≤n
Z 1Xn 2
i2πλr
= xr e dF (λ) ≥ 0,

0 r=1
i.e., γ is positive semidefinite.

Suppose conversely that γ : Z → R is a positive semidefinite function.
This implies that for 0 ≤ λ ≤ 1 and N ∈ N (Exercise 5.2)
1 X −i2πλr
fN (λ) : = e γ(r − s)ei2πλs
N
1≤r,s≤N
1 X
= (N − |m|)γ(m)e−i2πλm ≥ 0.
N
|m|<N
Put now Z λ
FN (λ) := fN (x) dx, 0 ≤ λ ≤ 1.
0
Then we have for each h ∈ Z
Z 1 Z 1
i2πλh
X |m|
e dFN (λ) = 1− γ(m) ei2πλ(h−m) dλ
0 N 0
|m|<N
(
1 − |h|
N γ(h), if |h| < N
= (5.4)
0, if |h| ≥ N .
Since FN (1) = γ(0) < ∞ for any N ∈ N, we can apply Helly’s selec-
tion theorem (cf. Billingsley, 1968, page 226ff) to deduce the existence
of a measure generating function F̃ and a subsequence (FNk )k such
that FNk converges weakly to F̃ i.e.,
Z 1 Z 1
k→∞
g(λ) dFNk (λ) −→ g(λ) dF̃ (λ)
0 0
for every continuous and bounded function g : [0, 1] → R (cf. Billings-

ley, 1968, Theorem 2.1). Put now F (λ) := F̃ (λ) − F̃ (0). Then F is a
measure generating function with F (0) = 0 and

Z 1 Z 1
g(λ) dF̃ (λ) = g(λ) dF (λ).
0 0
If we replace N in (5.4) by Nk and let k tend to infinity, we now obtain

representation (5.3).
Example 5.1.3. A white noise (εt )t∈Z has the autocovariance function
(
σ2, h = 0
γ(h) =
0, h ∈ Z \ {0}.
Since (
1
σ2, h = 0
Z
2 i2πλh
σ e dλ =
0 0, h ∈ Z \ {0},
the process (εt ) has by Theorem 5.1.2 the constant spectral density
f (λ) = σ 2 , 0 ≤ λ ≤ 1. This is the name giving property of the white
noise process: As the white light is characteristically perceived to
belong to objects that reflect nearly all incident energy throughout the
visible spectrum, a white noise process weighs all possible frequencies
equally.
Corollary 5.1.4. A symmetric function γ : Z → R is the autocovari-
ance function of a stationary process (Yt )t∈Z , iff it satisfies one of the
following two (equivalent) conditions:
R1
(i) γ(h) = 0 ei2πλh dF (λ), h ∈ Z, where F is a measure generating
function on [0, 1] with F (0) = 0.
P
(ii) 1≤r,s≤n xr γ(r − s)xs ≥ 0 for each n ∈ N and x1 , . . . , xn ∈ R.
Proof. Theorem 5.1.2 shows that (i) and (ii) are equivalent. The
assertion is then a consequence of Theorem 5.1.1.
P
Corollary 5.1.5. A symmetric function γ : Z → R with t∈Z |γ(t)| <
∞ is the autocovariance function of a stationary process iff
X
f (λ) := γ(t)e−i2πλt ≥ 0, λ ∈ [0, 1].
t∈Z
The function f is in this case the spectral density of γ.

Proof. Suppose first that γ is an autocovariance function.

P Since γ is in
this case positive semidefinite by Theorem 5.1.1, and t∈Z |γ(t)| < ∞
by assumption, we have (Exercise 5.2)
1 X −i2πλr
0 ≤ fN (λ) : = e γ(r − s)ei2πλs
N
1≤r,s≤N
X |t|
= 1− γ(t)e−i2πλt → f (λ) as N → ∞,
N
|t|<N
see Exercise 5.8. The function f is consequently nonnegative.

R1 The in-
verse Fourier transform in Theorem 4.2.5 implies γ(t) = 0 f (λ)ei2πλt dλ,
t ∈ Z i.e., f is the spectral density of γ. P
i2πλt
Suppose on the other hand that f (λ) = t∈Z γ(t)eR ≥ 0, 0 ≤
1
λ ≤ 1. The inverse Fourier transform implies γ(t) = 0 f (λ)ei2πλt dλ
R1 Rλ
= 0 ei2πλt dF (λ), where F (λ) = 0 f (x) dx, 0 ≤ λ ≤ 1. Thus we
have established representation (5.3), which implies that γ is positive
semidefinite, and, consequently, γ is by Corollary 5.1.4 the autoco-
variance function of a stationary process.
Example 5.1.6. Choose a number ρ ∈ R. The function

1, if h = 0

γ(h) = ρ, if h ∈ {−1, 1}

0, elsewhere
is the autocovariance function of a stationary process iff |ρ| ≤ 0.5.

This follows from
X
f (λ) = γ(t)ei2πλt
t∈Z
= γ(0) + γ(1)ei2πλ + γ(−1)e−i2πλ
= 1 + 2ρ cos(2πλ) ≥ 0
for λ ∈ [0, 1] iff |ρ| ≤ 0.5. Note that the function γ is the autocorre-
lation function of an MA(1)-process, cf. Example 2.2.2.
The spectral distribution function of a stationary process satisfies (Ex-
ercise 5.10)
F (0.5 + λ) − F (0.5− ) = F (0.5) − F ((0.5 − λ)− ), 0 ≤ λ < 0.5,
where F (x− ) := limε↓0 F (x − ε) is the left-hand limit of F at x ∈

(0, 1]. If F has a derivative f , we obtain from the above symmetry
f (0.5 + λ) = f (0.5 − λ) or, equivalently, f (1 − λ) = f (λ) and, hence,
Z 1 Z 0.5
γ(h) = cos(2πλh) dF (λ) = 2 cos(2πλh)f (λ) dλ.
0 0
The autocovariance function of a stationary process is, therefore, de-

termined by the values f (λ), 0 ≤ λ ≤ 0.5, if the spectral density
exists. Recall, moreover, that the smallest nonconstant period P0
visible through observations evaluated at time points t = 1, 2, . . . is
P0 = 2 i.e., the largest observable frequency is the Nyquist frequency
λ0 = 1/P0 = 0.5, cf. the end of Section 4.2. Hence, the spectral
density f (λ) matters only for λ ∈ [0, 0.5].
Remark 5.1.7. The preceding discussion shows that a function f :
[0, 1] → R is the spectral density of a stationary process iff f satisfies
the following three conditions
(i) f (λ) ≥ 0,
(ii) f (λ) = f (1 − λ),
R1
(iii) 0 f (λ) dλ < ∞.
5.2 Linear Filters and Frequencies

The application of a linear filter to a stationary time series has a quite
complex effect on its autocovariance function, see Theorem 2.1.6. Its
effect on the spectral density, if it exists, turns, however,
R λ out to be
quite simple.
R We use in the following again the notation 0 g(x) dF (x)
in place of (0,λ] g(x) dF (x).
Theorem 5.2.1. Let (Zt )t∈Z be a stationary process with spectral
distribution function FZ and let (at )t∈Z be an absolutely summable
filter
P with Fourier transform fa . The linear filtered process Yt :=
u∈Z au Zt−u , t ∈ Z, then has the spectral distribution function
Z λ
FY (λ) := |fa (x)|2 dFZ (x), 0 ≤ λ ≤ 1. (5.5)
0
5.2 Linear Filters and Frequencies 167
If in addition (Zt )t∈Z has a spectral density fZ , then
fY (λ) := |fa (λ)|2 fZ (λ), 0 ≤ λ ≤ 1, (5.6)
is the spectral density of (Yt )t∈Z .

Proof. Theorem 2.1.6 yields that (Yt )t∈Z is stationary with autoco-
variance function
XX
γY (t) = au aw γZ (t − u + w), t ∈ Z,
u∈Z w∈Z
where γZ is the autocovariance function of (Zt ). Its spectral repre-

sentation (5.3) implies
XX Z 1
γY (t) = au aw ei2πλ(t−u+w) dFZ (λ)
u∈Z w∈Z 0
Z 1X X
−i2πλu i2πλw
= au e aw e ei2πλt dFZ (λ)
0 u∈Z w∈Z
Z 1
= |fa (λ)|2 ei2πλt dFZ (λ)
Z0 1
= ei2πλt dFY (λ).
0
Theorem 5.1.2 now implies that FY is the uniquely determined spec-

tral distribution function of (Yt )t∈Z . The second to last equality yields
in addition the spectral density (5.6).
Transfer Function and Power Transfer Function

Since the spectral density is a measure of intensity of a frequency λ
inherent in a stationary process (see the discussion of the periodogram
in Section 4.2), the effect (5.6) of applying a linear filter (at ) with
Fourier transform fa can easily be interpreted. While the intensity of λ
is diminished by the filter (at ) iff |fa (λ)| < 1, its intensity is amplified
iff |fa (λ)| > 1. The Fourier transform fa of (at ) is, therefore, also
called transfer function and the function ga (λ) := |fa (λ)|2 is referred
to as the gain or power transfer function of the filter (at )t∈Z .
Example 5.2.2. The simple moving average of length three

(
1/3, u ∈ {−1, 0, 1}
au =
0 elsewhere
has the transfer function

1 2
fa (λ) = + cos(2πλ)
3 3
and the power transfer function

1, λ=0
ga (λ) = 2
 3sin(3πλ)
sin(πλ) , λ ∈ (0, 0.5]
(see Exercise 5.13 and Theorem 4.2.2). This power transfer function
is plotted in Plot 5.2.1 below. It shows that frequencies λ close to
zero i.e., those corresponding to a large period, remain essentially
unaltered. Frequencies λ close to 0.5, which correspond to a short
period, are, however, damped by the approximate factor 0.1, when the
moving average (au ) is applied to a process. The frequency λ = 1/3
is completely eliminated, since ga (1/3) = 0.
Plot 5.2.1: Power transfer function of the simple moving average of

length three.
1 /* power_transfer_sma3.sas */
2 TITLE1 ’Power transfer function’;
3 TITLE2 ’of the simple moving average of length 3’;
4
5 /* Compute power transfer function */
6 DATA data1;
7 DO lambda=.001 TO .5 BY .001;
8 g=(SIN(3*CONSTANT(’PI’)*lambda)/(3*SIN(CONSTANT(’PI’)*lambda)))
,→**2;
9 OUTPUT;
10 END;
11
13 AXIS1 LABEL=(’g’ H=1 ’a’ H=2 ’(’ F=CGREEK ’l)’);
16
17 /* Plot power transfer function */
19 PLOT g*lambda / VAXIS=AXIS1 HAXIS=AXIS2;
20 RUN; QUIT;
Example 5.2.3. The first order difference filter


1,
 u=0
au = −1, u = 1

0 elsewhere
has the transfer function
fa (λ) = 1 − e−i2πλ .
Since
fa (λ) = e−iπλ eiπλ − e−iπλ = ie−iπλ 2 sin(πλ),

its power transfer function is
ga (λ) = 4 sin2 (πλ).
The first order difference filter, therefore, damps frequencies close to

zero but amplifies those close to 0.5.
Plot 5.2.2: Power transfer function of the first order difference filter.
Example 5.2.4. The preceding example immediately carries over to

the seasonal difference filter of arbitrary length s ≥ 0 i.e.,

1,
 u=0
(s)
au = −1, u = s

0 elsewhere,
which has the transfer function
fa(s) (λ) = 1 − e−i2πλs
and the power transfer function
ga(s) (λ) = 4 sin2 (πλs).

Plot 5.2.3: Power transfer function of the seasonal difference filter of

order 12.
Since sin2 (x) = 0 iff x = kπ and sin2 (x) = 1 iff x = (2k + 1)π/2,
k ∈ Z, the power transfer function ga(s) (λ) satisfies for k ∈ Z
(
0, iff λ = k/s
ga(s) (λ) =
4 iff λ = (2k + 1)/(2s).
This implies, for example, in the case of s = 12 that those frequencies,

which are multiples of 1/12 = 0.0833, are eliminated, whereas the
midpoint frequencies k/12 + 1/24 are amplified. This means that the
seasonal difference filter on the one hand does what we would like it to
do, namely to eliminate the frequency 1/12, but on the other hand it
generates unwanted side effects by eliminating also multiples of 1/12
and by amplifying midpoint frequencies. This observation gives rise
to the problem, whether one can construct linear filters that have
prescribed properties.
Least Squares Based Filter Design

A low pass filter aims at eliminating high frequencies, a high pass filter
aims at eliminating small frequencies and a band pass filter allows only
frequencies in a certain band [λ0 − ∆, λ0 + ∆] to pass through. They
consequently should have the ideal power transfer functions
(
1, λ ∈ [0, λ0 ]
glow (λ) =
0, λ ∈ (λ0 , 0.5]
(
0, λ ∈ [0, λ0 )
ghigh (λ) =
1, λ ∈ [λ0 , 0.5]
(
1, λ ∈ [λ0 − ∆, λ0 + ∆]
gband (λ) =
0 elsewhere,
where λ0 is the cut off frequency in the first two cases and [λ0 −
∆, λ0 + ∆] is the cut off interval with bandwidth 2∆ > 0 in the final
one. Therefore, the question naturally arises, whether there actually
exist filters, which have a prescribed power transfer function. One
possible approach for fitting a linear filter with weights au to a given
transfer function f is offered by utilizing least squares. Since only
filters of finite length matter in applications, one chooses a transfer
function s
X
fa (λ) = au e−i2πλu
u=r
with fixed integers r, s and fits this function fa to f by minimizing
the integrated squared error
Z 0.5
|f (λ) − fa (λ)|2 dλ
0
in (au )r≤u≤s ∈ Rs−r+1 . This is achieved for the choice (Exercise 5.16)
Z 0.5
i2πλu
au = 2 Re f (λ)e dλ , u = r, . . . , s,
0
which is formally the real part of the inverse Fourier transform of f .

Example 5.2.5. For the low pass filter with cut off frequency 0 <
λ0 < 0.5 and ideal transfer function
f (λ) = 1[0,λ0 ] (λ)
we obtain the weights

(
Z λ0
2λ0 , u=0
au = 2 cos(2πλu) dλ = 1
0 6 0.
πu sin(2πλ0 u), u =
Plot 5.2.4: Transfer function of least squares fitted low pass filter with
cut off frequency λ0 = 1/10 and r = −20, s = 20.
1 /* transfer.sas */
2 TITLE1 ’Transfer function’;
3 TITLE2 ’of least squares fitted low pass filter’;
4
5 /* Compute transfer function */
6 DATA data1;
7 DO lambda=0 TO .5 BY .001;
8 f=2*1/10;
5.3 Spectral Density of an ARMA-Process 175
9 DO u=1 TO 20;
10 f=f+2*1/(CONSTANT(’PI’)*u)*SIN(2*CONSTANT(’PI’)*1/10*u)*COS(2*
,→CONSTANT(’PI’)*lambda*u);
11 END;
12 OUTPUT;
13 END;
14
16 AXIS1 LABEL=(’f’ H=1 ’a’ H=2 F=CGREEK ’(l)’);
18 SYMBOL1 V=NONE C=GREEN I=JOIN L=1;
19
20 /* Plot transfer function */
22 PLOT f*lambda / VAXIS=AXIS1 HAXIS=AXIS2 VREF=0;
23 RUN; QUIT;
The programs in Section 5.2 (Linear Filters and g by a DO loop over lambda from 0 to 0.5 with
Frequencies) are just made for the purpose a small increment. In Program 5.2.4 (trans-
of generating graphics, which demonstrate the fer.sas) it is necessary to use a second DO loop
shape of power transfer functions or, in case of within the first one to calculate the sum used in
Program 5.2.4 (transfer.sas), of a transfer func- the definition of the transfer function f.
tion. They all consist of two parts, a DATA step Two AXIS statements defining the axis labels
and a PROC step. and a SYMBOL statement precede the proce-
In the DATA step values of the power transfer dure PLOT, which generates the plot of g or f
function are calculated and stored in a variable versus lambda.
The transfer function in Plot 5.2.4, which approximates the ideal

transfer function 1[0,0.1] , shows higher oscillations near the cut off point
λ0 = 0.1. This is known as Gibbs’ phenomenon and requires further
smoothing of the fitted transfer function if these oscillations are to be
damped (cf. Bloomfield, 2000, Section 6.4).
5.3 Spectral Density of an ARMA-Process

Theorem 5.2.1 enables us to compute spectral densities of ARMA-
processes.
Theorem 5.3.1. Suppose that
Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q , t ∈ Z,
is a stationary ARMA(p, q)-process, where (εt ) is a white noise with

variance σ 2 . Put
A(z) := 1 − a1 z − a2 z 2 − · · · − ap z p ,
B(z) := 1 + b1 z + b2 z 2 + · · · + bq z q
and suppose that the process (Yt ) satisfies the stationarity condition
(2.4), i.e., the roots of the equation A(z) = 0 are outside of the unit
circle. The process (Yt ) then has the spectral density
−i2πλ 2
Pq
2 |B(e )| 2 |1 + v=1 bv e−i2πλv |2
fY (λ) = σ =σ . (5.7)
|A(e−i2πλ )|2 |1 − pu=1 au e−i2πλu |2
P
Proof. Since the process (Yt ) is supposed

P to satisfy the stationarity
condition (2.4) it is causal, i.e., Yt = v≥0 αv εt−v , t ∈ Z, for some
absolutely summable constants αv , v ≥ 0, see Section 2.2. The white
noise process (εt ) has by Example 5.1.3 the spectral density fε (λ) = σ 2
and, thus, (Yt ) has by Theorem 5.2.1 a spectral density fY . The
application of Theorem 5.2.1 to the process
Xt := Yt − a1 Yt−1 − · · · − ap Yt−p = εt + b1 εt−1 + · · · + bq εt−q
then implies that (Xt ) has the spectral density
fX (λ) = |A(e−i2πλ )|2 fY (λ) = |B(e−i2πλ )|2 fε (λ).
Since the roots of A(z) = 0 are assumed to be outside of the unit

circle, we have |A(e−i2πλ )| =
6 0 and, thus, the assertion of Theorem
5.3.1 follows.
The preceding result with a1 = · · · = ap = 0 implies that an MA(q)-
process has the spectral density
fY (λ) = σ 2 |B(e−i2πλ )|2 .
With b1 = · · · = bq = 0, Theorem 5.3.1 implies that a stationary

AR(p)-process, which satisfies the stationarity condition (2.4), has
the spectral density
1
fY (λ) = σ 2 .
|A(e−i2πλ )|2
Example 5.3.2. The stationary ARMA(1, 1)-process
Yt = aYt−1 + εt + bεt−1
with |a| < 1 has the spectral density
1 + 2b cos(2πλ) + b2
2
fY (λ) = σ .
1 − 2a cos(2πλ) + a2
The MA(1)-process, in which case a = 0, has, consequently the spec-

tral density
fY (λ) = σ 2 (1 + 2b cos(2πλ) + b2 ),
and the stationary AR(1)-process with |a| < 1, for which b = 0, has
the spectral density
1
fY (λ) = σ 2 .
1 − 2a cos(2πλ) + a2
The following figures display spectral densities of ARMA(1, 1)-pro-

cesses for various a and b with σ 2 = 1.
Plot 5.3.1: Spectral densities of ARMA(1, 1)-processes Yt = aYt−1 +

εt + bεt−1 with fixed a and various b; σ 2 = 1.
1 /* arma11_sd.sas */
2 TITLE1 ’Spectral densities of ARMA(1,1)-processes’;
3
4 /* Compute spectral densities of ARMA(1,1)-processes */
5 DATA data1;
6 a=.5;
7 DO b=-.9, -.2, 0, .2, .5;
8 DO lambda=0 TO .5 BY .005;
9 f=(1+2*b*COS(2*CONSTANT(’PI’)*lambda)+b*b)/(1-2*a*COS(2*CONSTANT
,→(’PI’)*lambda)+a*a);
10 OUTPUT;
11 END;
12 END;
13
15 AXIS1 LABEL=(’f’ H=1 ’Y’ H=2 F=CGREEK ’(l)’);
22 LEGEND1 LABEL=(’a=0.5, b=’);
23
24 /* Plot spectral densities of ARMA(1,1)-processes */


26 PLOT f*lambda=b / VAXIS=AXIS1 HAXIS=AXIS2 LEGEND=LEGEND1;
27 RUN; QUIT;
Like in section 5.2 (Linear Filters and Frequen- corresponding processes. Here SYMBOL state-
cies) the programs here just generate graphics. ments are necessary and a LABEL statement
In the DATA step some loops over the varying to distinguish the different curves generated by
parameter and over lambda are used to calcu- PROC GPLOT.
late the values of the spectral densities of the
Plot 5.3.2: Spectral densities of ARMA(1, 1)-processes with parameter

b fixed and various a. Corresponding file: arma11 sd2.sas.
Plot 5.3.2b: Spectral densities of MA(1)-processes Yt = εt + bεt−1 for

various b. Corresponding file: ma1 sd.sas.
Exercises 181
Plot 5.3.2c: Spectral densities of AR(1)-processes Yt = aYt−1 + εt for

various a. Corresponding file: ar1 sd.sas.
Exercises
5.1. Formulate and prove Theorem 5.1.1 for Hermitian functions K
and complex-valued stationary processes. Hint for the sufficiency part:
Let K1 be the real part and K2 be the imaginary part of K. Consider
the real-valued 2n × 2n-matrices
!
(n) (n)
1 K1 K2 (n)
M (n) =

, K = K l (r − s) , l = 1, 2.
2 −K2(n) K1(n) l 1≤r,s≤n
Then M (n) is a positive semidefinite matrix (check that z T K z̄ =

(x, y)T M (n) (x, y), z = x + iy, x, y ∈ Rn ). Proceed as in the proof
of Theorem 5.1.1: Let (V1 , . . . , Vn , W1 , . . . , Wn ) be a 2n-dimensional
normal distributed random vector with mean vector zero and covari-
ance matrix M (n) and define for n ∈ N the family of finite dimensional
distributions
Ft+1,...,t+n (v1 , w1 , . . . , vn , wn )
:= P {V1 ≤ v1 , W1 ≤ w1 , . . . , Vn ≤ vn , Wn ≤ wn },
t ∈ Z. By Kolmogorov’s theorem there exists a bivariate Gaussian
process (Vt , Wt )t∈Z with mean vector zero and covariances
1
E(Vt+h Vt ) = E(Wt+h Wt ) = K1 (h)
2
1
E(Vt+h Wt ) = − E(Wt+h Vt ) = K2 (h).
2
Conclude by showing that the complex-valued process Yt := Vt − iWt ,
t ∈ Z, has the autocovariance function K.
5.2. Suppose that A is a real positive semidefinite n × n-matrix i.e.,
xT Ax ≥ 0 for x ∈ Rn . Show that A is also positive semidefinite for
complex numbers i.e., z T Az̄ ≥ 0 for z ∈ Cn .
5.3. Use (5.3) to show that for 0 < a < 0.5
(
sin(2πah)
γ(h) = 2πh , h ∈ Z \ {0}
a, h=0
is the autocovariance function of a stationary process. Compute its
spectral density.
5.4. Compute the autocovariance function of a stationary process with
spectral density
0.5 − |λ − 0.5|
f (λ) = , 0 ≤ λ ≤ 1.
0.52
5.5. Suppose that F and G are measure generating functions defined
on some interval [a, b] with F (a) = G(a) = 0 and
Z Z
ψ(x) F (dx) = ψ(x) G(dx)
[a,b] [a,b]
for every continuous function ψ : [a, b] → R. Show that F=G. Hint:

Approximate the indicator function 1[a,t] (x), x ∈ [a, b], by continuous
functions.
Exercises 183
5.6. Generate a white noise process and plot its periodogram.
5.7. A real valued stationary process (Yt )t∈Z is supposed to have the
spectral density f (λ) = a + bλ, λ ∈ [0, 0.5]. Which conditions must be
satisfied by a and b? Compute the autocovariance function of (Yt )t∈Z .
5.8. (Césaro convergence) Show that limN →∞ N

P
PN −1 t=1 at = S implies
limN →∞ t=1 (1 − t/N )at = S.
PN −1 P −1 Ps
Hint: t=1 (1 − t/N )at = (1/N ) N s=1 t=1 at .
5.9. Suppose that (Yt )t∈Z and (Zt )t∈Z are stationary processes such
that Yr and Zs are uncorrelated for arbitrary r, s ∈ Z. Denote by FY
and FZ the pertaining spectral distribution functions and put Xt :=
Yt + Zt , t ∈ Z. Show that the process (Xt ) is also stationary and
compute its spectral distribution function.
5.10. Let (Yt ) be a real valued stationary process with spectral dis-
R 1 function F2 . Show that for any function g : [−0.5, 0.5] → C
tribution
with 0 |g(λ − 0.5)| dF (λ) < ∞
Z1 Z 1
g(λ − 0.5) dF (λ) = g(0.5 − λ) dF (λ).
0
0
In particular we have
F (0.5 + λ) − F (0.5− ) = F (0.5) − F ((0.5 − λ)− ).
Hint: Verify the equality first for g(x) = exp(i2πhx), h ∈ Z, and then
use the fact that, on compact sets, the trigonometric polynomials
are uniformly dense in the space of continuous functions, which in
turn form a dense subset in the space of square integrable functions.
Finally, consider the function g(x) = 1[0,ξ] (x), 0 ≤ ξ ≤ 0.5 (cf. the
hint in Exercise 5.5).
5.11. Let (Xt ) and (Yt ) be stationary processes with mean zero and
absolute summable covariance functions. If their spectral densities fX
and fY satisfy fX (λ) ≤ fY (λ) for 0 ≤ λ ≤ 1, show that
(i) Γn,Y −Γn,X is a positive semidefinite matrix, where Γn,X and Γn,Y
are the covariance matrices of (X1 , . . . , Xn )T and (Y1 , . . . , Yn )T
respectively, and
(ii) Var(aT (X1 , . . . , Xn )) ≤ Var(aT (Y1 , . . . , Yn )) for all vectors a =
(a1 , . . . , an )T ∈ Rn .
5.12. Compute the gain function of the filter

1/4,
 u ∈ {−1, 1}
au = 1/2, u=0

0 elsewhere.
5.13. The simple moving average
(
1/(2q + 1), u ≤ |q|
au =
0 elsewhere
has the gain function

1, λ=0
ga (λ) = sin((2q+1)πλ) 2
 (2q+1) sin(πλ) , λ ∈ (0, 0.5].
Is this filter for large q a low pass filter? Plot its power transfer
functions for q = 5/10/20. Hint: Exercise 4.2.
5.14. Compute the gain function of the exponential smoothing filter
(
α(1 − α)u , u ≥ 0
au =
0, u < 0,
where 0 < α < 1. Plot this function for various α. What is the effect
of α → 0 or α → 1?
5.15. Let (Xt )t∈Z be a stationary
Pprocess, let (au )u∈Z be an absolutely
summable filter and put Yt := u∈Z au Xt−u , t ∈ Z. If (bP w )w∈Z is an-
other absolutely summable filter, then the process Zt = w∈Z bw Yt−w
has the spectral distribution function
Zλ
FZ (λ) = |Fa (µ)|2 |Fb (µ)|2 dFX (µ)
0
Exercises 185
(cf. Exercise 4.11 (iii)).
5.16. Show that the function

Z0.5
D(ar , . . . , as ) = |f (λ) − fa (λ)|2 dλ
0
Ps
with fa (λ) = u=r au e−i2πλu , au ∈ R, is minimized for
Z 0.5
i2πλu
au := 2 Re f (λ)e dλ , u = r, . . . , s.
0
Hint: Put f (λ) = f1 (λ) + if2 (λ) and differentiate with respect to au .
5.17. Compute in analogy to Example 5.2.5 the transfer functions of

the least squares high pass and band pass filter. Plot these functions.
Is Gibbs’ phenomenon observable?
5.18. An AR(2)-process
Yt = a1 Yt−1 + a2 Yt−2 + εt
satisfying the stationarity condition (2.4) (cf. Exercise 2.25) has the
spectral density
σ2
fY (λ) = .
1 + a21 + a22 + 2(a1 a2 − a1 ) cos(2πλ) − 2a2 cos(4πλ)
Plot this function for various choices of a1 , a2 .
5.19. Show that (5.2) is a necessary condition for K to be the auto-

covariance function of a stationary process.
Chapter
Statistical Analysis in
the Frequency Domain
We will deal in this chapter with the problem of testing for a white
6
noise and the estimation of the spectral density. The empirical coun-
terpart of the spectral density, the periodogram, will be basic for both
problems, though it will turn out that it is not a consistent estimate.
Consistency requires extra smoothing of the periodogram via a linear
filter.
6.1 Testing for a White Noise

Our initial step in a statistical analysis of a time series in the frequency
domain is to test, whether the data are generated by a white noise
(εt )t∈Z . We start with the model
Yt = µ + A cos(2πλt) + B sin(2πλt) + εt ,
where we assume that the εt are independent and normal distributed
with mean zero and variance σ 2 . We will test the nullhypothesis
A=B=0
against the alternative
A 6= 0 or B 6= 0,
where the frequency λ, the variance σ 2 > 0 and the intercept µ ∈ R
are unknown. Since the periodogram is used for the detection of highly
intensive frequencies inherent in the data, it seems plausible to apply
it to the preceding testing problem as well. Note that (Yt )t∈Z is a
stationary process only under the nullhypothesis A = B = 0.
188 Statistical Analysis in the Frequency Domain
The Distribution of the Periodogram

In the following we will derive the tests by Fisher and Bartlett–
Kolmogorov–Smirnov for the above testing problem. In a preparatory
step we compute the distribution of the periodogram.
Lemma 6.1.1. Let ε1 , . . . , εn be independent and identically normal

distributed random variables with mean µ ∈ R and variance σ 2 > 0.
Denote by ε̄ := n1 nt=1 εt the sample mean of ε1 , . . . , εn and by
P
n
k 1X k
Cε = (εt − ε̄) cos 2π t ,
n n t=1 n
n
k 1X k
Sε = (εt − ε̄) sin 2π t
n n t=1 n
the cross covariances with Fourier frequencies k/n, 1 ≤ k ≤ [(n −

1)/2], cf. (4.1). Then the 2[(n − 1)/2] random variables
Cε (k/n), Sε (k/n), 1 ≤ k ≤ [(n − 1)/2],
are independent and identically N (0, σ 2 /(2n))-distributed.
Proof. Note that with m := [(n − 1)/2] we have

T
v := Cε (1/n), Sε (1/n), . . . , Cε (m/n), Sε (m/n)
= A(εt − ε̄)1≤t≤n
= A(In − n−1 En )(εt )1≤t≤n ,
where the 2m × n-matrix A is given by

 
1 1 1
cos 2π n cos 2π n 2 . . . cos 2π n n
 
 sin 2π 1 1
sin 2π n 2 . . . sin 2π n n 1
 n 
1 . .. 
A :=  .. . .
n 
m
cos 2π m m
 
cos 2π n 2 . . . cos 2π n
n n 


m
sin 2π n sin 2π mn2 . . . sin 2π m nn
6.1 Testing for a White Noise 189
In is the n × n-unity matrix and En is the n × n-matrix with each

entry being 1. The vector v is, therefore, normal distributed with
mean vector zero and covariance matrix
σ 2 A(In − n−1 En )(In − n−1 En )T AT

= σ 2 A(In − n−1 En )AT
= σ 2 AAT
σ2
= I2m ,
2n
which is a consequence of (4.5) and the orthogonality properties from
Lemma 4.1.2; see e.g. Falk et al. (2002, Definition 2.1.2).
Corollary 6.1.2. Let ε1 , . . . , εn be as in the preceding lemma and let

n o
2 2
Iε (k/n) = n Cε (k/n) + Sε (k/n)
be the pertaining periodogram, evaluated at the Fourier frequencies

k/n, 1 ≤ k ≤ [(n − 1)/2], k ∈ N. The random variables Iε (k/n)/σ 2
are independent and identically standard exponential distributed i.e.,
(
1 − exp(−x), x > 0
P {Iε (k/n)/σ 2 ≤ x} =
0, x ≤ 0.
Proof. Lemma 6.1.1 implies that

r r
2n 2n
C ε (k/n), Sε (k/n)
σ2 σ2
are independent standard normal random variables and, thus,
r 2 r 2
2Iε (k/n) 2n 2n
= Cε (k/n) + Sε (k/n)
σ2 σ2 σ2
is χ2 -distributed with two degrees of freedom. Since this distribution

has the distribution function 1 − exp(−x/2), x ≥ 0, the assertion
follows; see e.g. Falk et al. (2002, Theorem 2.1.7).
Denote by U1:m ≤ U2:m ≤ · · · ≤ Um:m the ordered values pertaining

to independent and uniformly on (0, 1) distributed random variables
U1 , . . . , Um . It is a well known result in the theory of order statistics
that the distribution of the vector (Uj:m )1≤j≤m coincides with that
of ((Z1 + · · · + Zj )/(Z1 + · · · + Zm+1 ))1≤j≤m , where Z1 , . . . , Zm+1 are
independent and identically exponential distributed random variables;
see, for example, Reiss (1989, Theorem 1.6.7). The following result,
which will be basic for our further considerations, is, therefore, an
immediate consequence of Corollary 6.1.2; see also Exercise 6.3. By
=D we denote equality in distribution.
Theorem 6.1.3. Let ε1 , . . . , εn be independent N (µ, σ 2 )-distributed

random variables and denote by
Pj
k=1 Iε (k/n)
Sj := Pm , j = 1, . . . , m := [(n − 1)/2],
k=1 Iε (k/n)
the cumulated periodogram. Note that Sm = 1. Then we have

S1 , . . . , Sm−1 =D U1:m−1 , . . . , Um−1:m−1 ).
The vector (S1 , . . . , Sm−1 ) has, therefore, the Lebesgue-density

(
(m − 1)!, if 0 < s1 < · · · < sm−1 < 1
f (s1 , . . . , sm−1 ) =
0 elsewhere.
The following consequence of the preceding result is obvious.
Corollary 6.1.4. The empirical distribution function of S1 , . . . , Sm−1

is distributed like that of U1 , . . . , Um−1 , i.e.,
m−1 m−1
1 X 1 X
F̂m−1 (x) := 1(0,x] (Sj ) =D 1(0,x] (Uj ), x ∈ [0, 1].
m − 1 j=1 m − 1 j=1
Corollary 6.1.5. Put S0 := 0 and

max1≤j≤m Iε (j/n)
Mm := max (Sj − Sj−1 ) = Pm .
1≤j≤m k=1 Iε (k/n)
The maximum spacing Mm has the distribution function

m
j m
X
Gm (x) := P {Mm ≤ x} = (−1) (max{0, 1−jx})m−1 , x > 0.
j=0
j
Proof. Put
Vj := Sj − Sj−1 , j = 1, . . . , m.
By Theorem 6.1.3 the vector (V1 , . . . , Vm ) is distributed like the length
of the m consecutive intervals into which [0, 1] is partitioned by the
m − 1 random points U1 , . . . , Um−1 :
(V1 , . . . , Vm ) =D (U1:m−1 , U2:m−1 − U1:m−1 , . . . , 1 − Um−1:m−1 ).
The probability that Mm is less than or equal to x equals the prob-

ability that all spacings Vj are less than or equal to x, and this is
provided by the covering theorem as stated in Feller (1971, Theorem
3 in Section I.9).
Fisher’s Test
The preceding results suggest to test the hypothesis Yt = εt with εt
independent and N (µ, σ 2 )-distributed, by testing for the uniform dis-
tribution on [0, 1]. Precisely, we will reject this hypothesis if Fisher’s
κ-statistic
max1≤j≤m I(j/n)
κm := = mMm
(1/m) m
P
k=1 I(k/n)
is significantly large, i.e., if one of the values I(j/n) is significantly
larger than the average over all. The hypothesis is, therefore, rejected
at error level α if
c
α
κm > cα with 1 − Gm = α.
m
This is Fisher’s test for hidden periodicities. Common values are
α = 0.01 and = 0.05. Table 6.1.1, taken from Fuller (1995), lists
several critical values cα .
Note that these quantiles can be approximated by the corresponding
quantiles of a Gumbel distribution if m is large (Exercise 6.12).
m c0.05 c0.01 m c0.05 c0.01

10 4.450 5.358 150 7.832 9.372
15 5.019 6.103 200 8.147 9.707
20 5.408 6.594 250 8.389 9.960
25 5.701 6.955 300 8.584 10.164
30 5.935 7.237 350 8.748 10.334
40 6.295 7.663 400 8.889 10.480
50 6.567 7.977 500 9.123 10.721
60 6.785 8.225 600 9.313 10.916
70 6.967 8.428 700 9.473 11.079
80 7.122 8.601 800 9.612 11.220
90 7.258 8.750 900 9.733 11.344
100 7.378 8.882 1000 9.842 11.454
Table 6.1.1: Critical values cα of Fisher’s test for hidden periodicities.
The Bartlett–Kolmogorov–Smirnov Test

Denote again by Sj the cumulated periodogram as in Theorem 6.1.3.
If actually Yt = εt with εt independent and identically N (µ, σ 2 )-
distributed, then we know from Corollary 6.1.4 that the empirical
distribution function F̂m−1 of S1 , . . . , Sm−1 behaves stochastically ex-
actly like that of m−1 independent and uniformly on (0, 1) distributed
random variables. Therefore, with the Kolmogorov–Smirnov statistic
∆m−1 := sup |F̂m−1 (x) − x|
x∈[0,1]
we can measure the maximum difference between the empirical dis-

tribution function and the theoretical one F (x) = x, x ∈ [0, 1].
The following rule is quite common. For m > 30, the hypothesis
2
Yt = εt with εt being √ independent and N (µ, σ )-distributed is re-
jected if ∆m−1 > cα / m − 1, where c0.05 = 1.36 and c0.01 = 1.63 are
the critical values for the levels α = 0.05 and α = 0.01.
This Bartlett-Kolmogorov-Smirnov test can also be carried out visu-
ally by plotting for x ∈ [0, 1] the sample distribution function F̂m−1 (x)
and the band
cα
y =x± √ .
m−1
The hypothesis Yt = εt is rejected if F̂m−1 (x) is for some x ∈ [0, 1]

outside of this confidence band.
Example 6.1.6. (Airline Data). We want to test, whether the vari-

ance stabilized, trend eliminated and seasonally adjusted Airline Data
from Example 1.3.1 were generated from a white noise (εt )t∈Z , where
εt are independent and identically normal distributed. The Fisher
test statistic has the value κm = 6.573 and does, therefore, not reject
the hypothesis at the levels α = 0.05 and α = 0.01, where m = 65.
The Bartlett–Kolmogorov–Smirnov test, however, √ rejects this hypoth-
esis at both√levels, since ∆64 = 0.2319 > 1.36/ 64 = 0.17 and also
∆64 > 1.63/ 64 = 0.20375.
SPECTRA Procedure
----- Test for White Noise for variable DLOGNUM -----
Fisher’s Kappa: M*MAX(P(*))/SUM(P(*))

Parameters: M = 65
MAX(P(*)) = 0.028
SUM(P(*)) = 0.275
Test Statistic: Kappa = 6.5730
Bartlett’s Kolmogorov-Smirnov Statistic:

Maximum absolute difference of the standardized
partial sums of the periodogram and the CDF of a
uniform(0,1) random variable.
Test Statistic = 0.2319
Listing 6.1.1: Fisher’s κ and the Bartlett-Kolmogorov-Smirnov test

with m = 65 for testing a white noise generation of the adjusted
Airline Data.
1 /* airline_whitenoise.sas */
2 TITLE1 ’Tests for white noise’;
3 TITLE2 ’for the trend und seasonal’;
4 TITLE3 ’adjusted Airline Data’;
5
6 /* Read in the data and compute log-transformation as well as seasonal

,→ and trend adjusted data */
7 DATA data1;
9 INPUT num @@;

10 dlognum=DIF12(DIF(LOG(num)));
11
12 /* Compute periodogram and test for white noise */

13 PROC SPECTRA DATA=data1 P WHITETEST OUT=data2;
14 VAR dlognum;
15 RUN; QUIT;
In the DATA step the raw data of the airline pas- data. The option WHITETEST causes SAS
sengers are read into the variable num. A log- to carry out the two tests for a white noise,
transformation, building the fist order difference Fisher’s test and the Bartlett-Kolmogorov-
for trend elimination and the 12th order differ- Smirnov test. SAS only provides the values
ence for elimination of a seasonal component of the test statistics but no decision. One has
lead to the variable dlognum, which is sup- to compare these values with the critical val-
posed to be generated by a stationary process. ues from Table 6.1.1 (Critical values for Fisher’s
Then PROC SPECTRA is applied to this vari- Test√in the script) and the approximative ones
able, whereby the options P and OUT=data2 cα / m − 1.
generate a data set containing the periodogram
The following figure visualizes the rejection at both levels by the

Bartlett-Kolmogorov-Smirnov test.
Plot 6.1.2: Bartlett-Kolmogorov-Smirnov test with m = 65 testing

for a white noise generation of the adjusted Airline Data. Solid
line/broken line = confidence bands for F̂m−1 (x), x ∈ [0, 1], at lev-
els α = 0.05/0.01.
1 /* airline_whitenoise_plot.sas */
2 TITLE1 ’Visualisation of the test for white noise’;
3 TITLE2 ’for the trend und seasonal adjusted’;
,→program (airline_whitenoise.sas) */
6
7 /* Calculate the sum of the periodogram */
8 PROC MEANS DATA=data2(FIRSTOBS=2) NOPRINT;
9 VAR P_01;
10 OUTPUT OUT=data3 SUM=psum;
11
12 /* Compute empirical distribution function of cumulated periodogram
,→and its confidence bands */
13 DATA data4;
15 IF _N_=1 THEN SET data3;
16 RETAIN s 0;
17 s=s+P_01/psum;
18 fm=_N_/(_FREQ_-1);
19 yu_01=fm+1.63/SQRT(_FREQ_-1);
20 yl_01=fm-1.63/SQRT(_FREQ_-1);
21 yu_05=fm+1.36/SQRT(_FREQ_-1);
22 yl_05=fm-1.36/SQRT(_FREQ_-1);
23
25 SYMBOL1 V=NONE I=STEPJ C=GREEN;
26 SYMBOL2 V=NONE I=JOIN C=RED L=2;
28 AXIS1 LABEL=(’x’) ORDER=(.0 TO 1.0 BY .1);
30
31 /* Plot empirical distribution function of cumulated periodogram with
,→its confidence bands */
33 PLOT fm*s=1 yu_01*fm=2 yl_01*fm=2 yu_05*fm=3 yl_05*fm=3 / OVERLAY
,→HAXIS=AXIS1 VAXIS=AXIS2;
34 RUN; QUIT;
This program uses the data set data2 cre- fm contains the values of the empirical distribu-
ated by Program 6.1.1 (airline whitenoise.sas), tion function calculated by means of the auto-
where the first observation belonging to the fre- matically generated variable N containing the
quency 0 is dropped. PROC MEANS calculates number of observation and the variable FREQ ,
the sum (keyword SUM) of the SAS periodogram which was created by PROC MEANS and con-
variable P 0 and stores it in the variable psum tains the number m. The values of the upper
of the data set data3. The NOPRINT option and lower band are stored in the y variables.
suppresses the printing of the output. The last part of this program contains SYMBOL
The next DATA step combines every observa- and AXIS statements and PROC GPLOT to vi-
tion of data2 with this sum by means of the sualize the Bartlett-Kolmogorov-Smirnov statis-
IF statement. Furthermore a variable s is ini- tic. The empirical distribution of the cumulated
tialized with the value 0 by the RETAIN state- periodogram is represented as a step function
ment and then the portion of each periodogram due to the I=STEPJ option in the SYMBOL1
value from the sum is cumulated. The variable statement.
6.2 Estimating Spectral Densities

We suppose in the following that (Yt )t∈Z is a stationary real valued pro-
cess with mean µ and absolutely summable autocovariance function
γ. According to Corollary 5.1.5, the process (Yt ) has the continuous
spectral density X
f (λ) = γ(h)e−i2πλh .
h∈Z
In the preceding section we computed the distribution of the empirical
counterpart of a spectral density, the periodogram, in the particular
case, when (Yt ) is a Gaussian white noise. In this section we will inves-
6.2 Estimating Spectral Densities 197
tigate the limit behavior of the periodogram for arbitrary independent

random variables (Yt ).
Asymptotic Properties of the Periodogram

In order to establish asymptotic properties of the periodogram, its
following modification is quite useful. For Fourier frequencies k/n,
0 ≤ k ≤ [n/2] we put
n
1 X −i2π(k/n)t 2
In (k/n) = Yt e
n t=1

( n n
)
1 X k 2 X k 2
= Yt cos 2π t + Yt sin 2π t .
n t=1
n t=1
n
(6.1)
Up to k = 0, this coincides by (4.5) with the definition of the peri-

odogram as given in (4.6) on page 143. From Theorem 4.2.3 we obtain
the representation
(
nȲn2 , k=0
In (k/n) = P −i2π(k/n)h
(6.2)
|h|<n c(h)e , k = 1, . . . , [n/2]
1
Pn
with Ȳn := n t=1 Yt and the sample autocovariance function
n−|h|
1 X
c(h) = Yt − Ȳn Yt+|h| − Ȳn .
n t=1
By representation (6.1), the proof of Theorem 4.2.3 and the equations

(4.5), the value In (k/n) does not change for k 6= 0 if we replace the
sample mean Ȳn in c(h) by the theoretical mean µ. This leads to the
equivalent representation of the periodogram for k 6= 0
X 1 n−|h|
X
In (k/n) = (Yt − µ)(Yt+|h| − µ) e−i2π(k/n)h
n t=1
|h|<n
n
1 X
= (Yt − µ)2 + (6.3)
n t=1
n−1 n−|h|
X 1 X k
2 (Yt − µ)(Yt+|h| − µ) cos 2π h .
n t=1
n
h=1
We define now the periodogram for λ ∈ [0, 0.5] as a piecewise constant

function
k 1 k 1
In (λ) = In (k/n) if − <λ≤ + . (6.4)
n 2n n 2n
The following result shows that the periodogram In (λ) is for λ 6= 0 an
asymptotically unbiased estimator of the spectral density f (λ).
Theorem 6.2.1. Let (Yt )t∈Z be a stationary process with absolutely
summable autocovariance function γ. Then we have with µ = E(Yt )
n→∞
E(In (0)) − nµ2 −→ f (0),
n→∞
E(In (λ)) −→ f (λ), λ 6= 0.
n→∞
If µ = 0, then the convergence E(In (λ)) −→ f (λ) holds uniformly on
[0, 0.5].
Proof. By representation (6.2) and the Césaro convergence result (Ex-
ercise 5.8) we have
n n
2 1 XX
E(In (0)) − nµ = E(Yt Ys ) − nµ2
n t=1 s=1
n n
1 XX
= Cov(Yt , Ys )
n t=1 s=1
X |h| n→∞
X
= 1− γ(h) −→ γ(h) = f (0).
n
|h|<n h∈Z
Define now for λ ∈ [0, 0.5] the auxiliary function

k k 1 k 1
gn (λ) := , if − < λ ≤ + , k ∈ Z. (6.5)
n n 2n n 2n
Then we obviously have
In (λ) = In (gn (λ)). (6.6)
Choose now λ ∈ (0, 0.5]. Since gn (λ) −→ λ as n → ∞, it follows that
gn (λ) > 0 for n large enough. By (6.3) and (6.6) we obtain for such n
X 1 n−|h|
X
E(In (λ)) = E (Yt − µ)(Yt+|h| − µ) e−i2πgn (λ)h
n t=1
|h|<n
X |h|

= 1− γ(h)e−i2πgn (λ)h .
n
|h|<n
P P
Since h∈Z |γ(h)| < ∞, the series |h|<n γ(h) exp(−i2πλh) converges
to f (λ) uniformly for 0 ≤ λ ≤ 0.5. Kronecker’s Lemma (Exercise 6.8)
implies moreover
X |h| X |h|
−i2πλh n→∞
γ(h)e ≤ |γ(h)| −→ 0,

n n

|h|<n |h|<n
and, thus, the series

X |h|
fn (λ) := 1− γ(h)e−i2πλh
n
|h|<n
n→∞
converges to f (λ) uniformly in λ as well. From gn (λ) −→ λ and the
continuity of f we obtain for λ ∈ (0, 0.5]
| E(In (λ)) − f (λ)| = |fn (gn (λ)) − f (λ)|
≤ |fn (gn (λ)) − f (gn (λ))| + |f (gn (λ)) − f (λ)|
n→∞
−→ 0.
Note that |gn (λ) − λ| ≤ 1/(2n). The uniform convergence in case of
µ = 0 then follows from the uniform convergence of gn (λ) to λ and
the uniform continuity of f on the compact interval [0, 0.5].
In the following result we compute the asymptotic distribution of

the periodogram for independent and identically distributed random
variables with zero mean, which are not necessarily Gaussian ones.
The Gaussian case was already established in Corollary 6.1.2.
Theorem 6.2.2. Let Z1 , . . . , Zn be independent and identically dis-
tributed random variables with mean E(Zt ) = 0 and variance E(Zt2 ) =
σ 2 < ∞. Denote by In (λ) the pertaining periodogram as defined in
(6.6).
(i) The random vector (In (λ1 ), . . . , In (λr )) with 0 < λ1 < · · · <
λr < 0.5 converges in distribution for n → ∞ to the distribution
of r independent and identically exponential distributed random
variables with mean σ 2 .
(ii) If E(Zt4 ) = ησ 4 < ∞, then we have for k = 0, . . . , [n/2], k ∈ N
(
2σ 4 + n−1 (η − 3)σ 4 , k = 0 or k = n/2
Var(In (k/n)) =
σ 4 + n−1 (η − 3)σ 4 elsewhere
(6.7)
and
Cov(In (j/n), In (k/n)) = n−1 (η − 3)σ 4 , j 6= k. (6.8)
For N (0, σ 2 )-distributed random variables Zt we have η = 3 (Exer-
cise 6.9) and, thus, In (k/n) and In (j/n) are for j 6= k uncorrelated.
Actually, we established in Corollary 6.1.2 that they are independent
in this case.
Proof. Put for λ ∈ (0, 0.5)
p n
X
An (λ) := An (gn (λ)) := 2/n Zt cos(2πgn (λ)t),
t=1
p Xn
Bn (λ) := Bn (gn (λ)) := 2/n Zt sin(2πgn (λ)t),
t=1
with gn defined in (6.5). Since

1n 2 2
o
In (λ) = A (λ) + Bn (λ) ,
2 n
it suffices, by repeating the arguments in the proof of Corollary 6.1.2,

to show that
An (λ1 ), Bn (λ1 ), . . . , An (λr ), Bn (λr ) −→D N (0, σ 2 I2r ),

where I2r denotes the 2r × 2r-unity matrix and −→D convergence in

n→∞
distribution. Since gn (λ) −→ λ, we have gn (λ) > 0 for λ ∈ (0, 0.5) if
n is large enough. The independence of Zt together with the definition
of gn and the orthogonality equations in Lemma 4.1.2 imply
Var(An (λ)) = Var(An (gn (λ)))

n
22
X
=σ cos2 (2πgn (λ)t) = σ 2 .
n t=1
For ε > 0 we have

n
1X 2 2 √

E Zt cos (2πgn (λ)t)1{|Zt cos(2πgn (λ)t)|>ε nσ2 }
n t=1
n
1X 2 √

2 √

n→∞
≤ E Zt 1{|Zt |>ε nσ2 } = E Z1 1{|Z1 |>ε nσ2 } −→ 0
n t=1
i.e., the triangular array (2/n)1/2 Zt cos(2πgn (λ)t), 1 ≤ t ≤ n, n ∈ N,

satisfies the Lindeberg condition implying An (λ) −→D N (0, σ 2 ); see,
for example, Billingsley (1968, Theorem 7.2).
Similarly one shows that Bn (λ) −→D N (0, σ 2 ) as well. Since the
random vector (An (λ1 ), Bn (λ1 ), . . . , An (λr ), Bn (λr )) has by (3.2) the
covariance matrix σ 2 I2r , its asymptotic joint normality follows easily
by applying the Cramér-Wold device, cf. Billingsley (1968, Theorem
7.7), and proceeding as before. This proves part (i) of Theorem 6.2.2.
From the definition of the periodogram in (6.1) we conclude as in the
proof of Theorem 4.2.3
n n
1 XX k
In (k/n) = Zs Zt e−i2π n (s−t)
n s=1 t=1
and, thus, we obtain

E(In (j/n)In (k/n))
n n n n
1 XXXX j k
= 2 E(Zs Zt Zu Zv )e−i2π n (s−t) e−i2π n (u−v) .
n s=1 t=1 u=1 v=1
We have

4
ησ , s = t = u = v

E(Zs Zt Zu Zv ) = σ 4 , s = t 6= u = v, s = u 6= t = v, s = v 6= t = u

0 elsewhere
and

j k
1,
 s = t, u = v
e−i2π n (s−t) e−i2π n (u−v) = e−i2π((j+k)/n)s ei2π((j+k)/n)t , s = u, t = v
e−i2π((j−k)/n)s ei2π((j−k)/n)t , s = v, t = u.

This implies
E(In (j/n)In (k/n))
n
ησ 4 σ 4 n X 2
−i2π((j+k)/n)t
= + 2 n(n − 1) + e +

n n t=1
Xn 2 o
−i2π((j−k)/n)t
e − 2n

t=1
4 n
(η − 3)σ 4
n 1 X i2π((j+k)/n)t 2
= + σ 1 + 2 e +
n n t=1
n
1 X i2π((j−k)/n)t 2 o
e .
n2 t=1

From E(In (k/n)) = n−1 nt=1 E(Zt2 ) = σ 2 we finally obtain

P
Cov(In (j/n), In (k/n))

n n
(η − 3)σ 4 σ 4 n X i2π((j+k)/n)t 2 X i2π((j−k)/n)t 2 o
= + 2 e + e ,
n n t=1 t=1
from which (6.7) and (6.8) follow by using (4.5) on page 142.
Remark
P 6.2.3. Theorem 6.2.2 can be generalized to filtered processes
Yt = u∈Z au Zt−u , with (Zt )t∈Z as in Theorem 6.2.2. In this case one
has to replace σ 2 , which equals by Example 5.1.3 the constant spectral
density fPZ (λ), in (i) by the spectral density fY (λi ), 1 ≤ i ≤ r. If in
addition u∈Z |au ||u|1/2 < ∞, then we have in (ii) the expansions
(
2fY2 (k/n) + O(n−1/2 ), k = 0 or k = n/2
Var(In (k/n)) =
fY2 (k/n) + O(n−1/2 ) elsewhere,
and
Cov(In (j/n), In (k/n)) = O(n−1 ), j 6= k,
where In is the periodogram pertaining to Y1 , . . . , Yn . The above terms
O(n−1/2 ) and O(n−1 ) are uniformly bounded in k and j by a constant
C.
We omit the highly technical proof and refer to Brockwell and Davis
(1991, Section 10.3). P
Recall that the class of processes Yt = u∈Z au Zt−u is a fairly rich
one, which contains in particular ARMA-processes, see Section 2.2
and Remark 2.1.12.
Discrete Spectral Average Estimator

The preceding results show that the periodogram is not a consistent
estimator of the spectral density. The law of large numbers together
with the above remark motivates, however, that consistency can be
achieved for a smoothed version of the periodogram such as a simple
moving average
X 1 k+j
In ,
2m + 1 n
|j|≤m
which puts equal weights 1/(2m + 1) on adjacent values. Dropping

the condition of equal weights, we define a general linear smoother by
the linear filter
k X k + j
fˆn := ajn In . (6.9)
n n
|j|≤m
The sequence m = m(n), defining the adjacent points of k/n, has to

satisfy
n→∞ n→∞
m −→ ∞ and m/n −→ 0, (6.10)
and the weights ajn have the properties
(i) ajn ≥ 0,
(ii) ajn = a−jn ,
P
(iii) |j|≤m ajn = 1,
(6.11)
2 n→∞
P
(iv) |j|≤m ajn −→ 0.
For the simple moving average we have, for example

(
1/(2m + 1), |j| ≤ m
ajn =
0 elsewhere
n→∞
and |j|≤m a2jn = 1/(2m + 1) −→ 0. For λ ∈ [0, 0.5] we put In (0.5 +
P
λ) := In (0.5 − λ), which defines the periodogram also on [0.5, 1].

If (k + j)/n is outside of the interval [0, 1], then In ((k + j)/n) is
understood as the periodic extension of In with period 1. This also
applies to the spectral density f . The estimator
fˆn (λ) := fˆn (gn (λ)),
with gn as defined in (6.5), is called the discrete spectral average esti-
mator . The following result states its consistency for linear processes.
P
Theorem 6.2.4. Let Yt = u∈Z bu Zt−u , t ∈ Z, where Zt are iid with
E(Zt ) = 0, E(Zt4 ) < ∞ and u∈Z |bu ||u|1/2 < ∞. Then we have for
P
0 ≤ µ, λ ≤ 0.5
(i) limn→∞ E fˆn (λ) = f (λ),


 2f 2 (λ), λ = µ = 0 or 0.5
ˆ ˆ
Cov fn (λ),fn (µ)

(ii) limn→∞ = f 2 (λ), 0 < λ = µ < 0.5
2
P
|j|≤m ajn 
0, λ 6= µ.
Condition (iv) in (6.11) on the weights together with (ii) in the preced-
n→∞
ing result entails that Var(fˆn (λ)) −→ 0 for any λ ∈ [0, 0.5]. Together
with (i) we, therefore, obtain that the mean squared error of fˆn (λ)
vanishes asymptotically:
n 2 o
ˆ ˆ
MSE(fn (λ)) = E fn (λ) − f (λ)
= Var(fˆn (λ)) + Bias2 (fˆn (λ))
−→n→∞ 0.
Proof. By the definition of the spectral density estimator in (6.9) we

have
X n o
ˆ
| E(fn (λ)) − f (λ)| =

ajn E In (gn (λ) + j/n)) − f (λ)

|j|≤m
X n
= ajn E In (gn (λ) + j/n)) − f (gn (λ) + j/n)

|j|≤m
o
+ f (gn (λ) + j/n) − f (λ) ,

where (6.10) together with the uniform convergence of gn (λ) to λ

implies
n→∞
max |gn (λ) + j/n − λ| −→ 0.
|j|≤m
Choose ε > 0. The spectral density f of the process (Yt ) is continuous

(Exercise 6.16), and hence we have
max |f (gn (λ) + j/n) − f (λ)| < ε/2

|j|≤m
if n is sufficiently large. From Theorem 6.2.1 we know that in the case

E(Yt ) = 0
max | E(In (gn (λ) + j/n)) − f (gn (λ) + j/n)| < ε/2
|j|≤m
if n is large. Condition (iii) in (6.11) together with the triangular

inequality implies | E(fˆn (λ)) − f (λ)| < ε for large n. This implies part
(i).
From the definition of fˆn we obtain
Cov(fˆn (λ), fˆn (µ))

X X
= ajn akn Cov In (gn (λ) + j/n), In (gn (µ) + k/n) .
|j|≤m |k|≤m
If λ 6= µ and n sufficiently large we have gn (λ) + j/n 6= gn (µ) + k/n

for arbitrary |j|, |k| ≤ m. According to Remark 6.2.3 there exists a
universal constant C1 > 0 such that
X X
ˆ ˆ
| Cov(fn (λ), fn (µ))| =
−1
ajn akn O(n )
|j|≤m |k|≤m
X 2
≤ C1 n−1 ajn
|j|≤m
2m + 1 X 2
≤ C1 ajn ,
n
|j|≤m
where the final inequality is an application of the Cauchy–Schwarz

n→∞
inequality. Since m/n −→ 0, we have established (ii) in the case
λ 6= µ. Suppose now 0 < λ = µ < 0.5. Utilizing again Remark 6.2.3
we have
X n o
Var(fˆn (λ)) = 2 2
ajn f (gn (λ) + j/n) + O(n−1/2
)
|j|≤m
X X
+ ajn akn O(n−1 ) + o(n−1 )
|j|≤m |k|≤m
=: S1 (λ) + S2 (λ) + o(n−1 ).
Repeating the arguments in the proof of part (i) one shows that
X X
2 2 2
S1 (λ) = ajn f (λ) + o ajn .
|j|≤m |j|≤m
Furthermore, with a suitable constant C2 > 0, we have

1 X 2 2m + 1 X 2
|S2 (λ)| ≤ C2 ajn ≤ C2 ajn .
n n
|j|≤m |j|≤m
Thus we established the assertion of part (ii) also in the case 0 < λ =
µ < 0.5. The remaining cases λ = µ = 0 and λ = µ = 0.5 are shown
in a similar way (Exercise 6.17).
The preceding result requires zero mean variables Yt . This might, how-
ever, be too restrictive in practice. Due to (4.5), the periodograms
of (Yt )1≤t≤n , (Yt − µ)1≤t≤n and (Yt − Ȳ )1≤t≤n coincide at Fourier fre-
quencies different from zero. At frequency λ = 0, however, they
will differ in general. To estimate f (0) consistently also in the case
µ = E(Yt ) 6= 0, one puts
m
X
fˆn (0) := a0n In (1/n) + 2

ajn In (1 + j)/n . (6.12)
j=1
Each time the value In (0) occurs in the moving average (6.9), it is
replaced by fˆn (0). Since the resulting estimator of the spectral density
involves only Fourier frequencies different from zero, we can assume
without loss of generality that the underlying variables Yt have zero
mean.
Example 6.2.5. (Sunspot Data). We want to estimate the spectral

density underlying the Sunspot Data. These data, also known as
the Wolf or Wölfer (a student of Wolf) Data, are the yearly sunspot
numbers between 1749 and 1924. For a discussion of these data and
further literature we refer to Wei and Reilly (1989), Example 6.2.
Plot 6.2.1 shows the pertaining periodogram and Plot 6.2.1b displays
the discrete spectral average estimator with weights a0n = a1n = a2n =
3/21, a3n = 2/21 and a4n = 1/21, n = 176. These weights pertain
to a simple moving average of length 3 of a simple moving average
of length 7. The smoothed version joins the two peaks close to the
frequency λ = 0.1 visible in the periodogram. The observation that a
periodogram has the tendency to split a peak is known as the leakage
phenomenon.
Plot 6.2.1: Periodogram for Sunspot Data.
Plot 6.2.1b: Discrete spectral average estimate for Sunspot Data.
1 /* sunspot_dsae.sas */
2 TITLE1 ’Periodogram and spectral density estimate’;

3 TITLE2 ’Woelfer Sunspot Data’;
4

6 DATA data1;
7 INFILE ’c:\data\sunspot.txt’;
8 INPUT num @@;
9
10 /* Computation of peridogram and estimation of spectral density */

11 PROC SPECTRA DATA=data1 P S OUT=data2;
12 VAR num;
13 WEIGHTS 1 2 3 3 3 2 1;
14
16 DATA data3;
19 p=P_01/2;
20 s=S_01/2*4*CONSTANT(’PI’);
21
23 SYMBOL1 I=JOIN C=RED V=NONE L=1;
24 AXIS1 LABEL=(F=CGREEK ’l’) ORDER=(0 TO .5 BY .05);
26
27 /* Plot periodogram and estimated spectral density */
29 PLOT p*lambda / HAXIS=AXIS1 VAXIS=AXIS2;
30 PLOT s*lambda / HAXIS=AXIS1 VAXIS=AXIS2;
31 RUN; QUIT;
In the DATA step the data of the sunspots are the weights given in the WEIGHTS statement.
read into the variable num. Note that SAS automatically normalizes these
Then PROC SPECTRA is applied to this vari- weights.
able, whereby the options P (see Pro- In following DATA step the slightly different def-
gram 6.1.1, airline whitenoise.sas) and S gen- inition of the periodogram by SAS is being ad-
erate a data set stored in data2 containing justed to the definition used here (see Pro-
the periodogram data and the estimation of gram 4.2.1, star periodogram.sas). Both plots
the spectral density which SAS computes with are then printed with PROC GPLOT.
A mathematically convenient way to generate weights ajn , which sat-

isfy the conditions (6.11), is the use of a kernel function. Let K :
[−1, 1] → [0, ∞) be a Rsymmetric function i.e., K(−x) = K(x), x ∈
1 n→∞
[−1, 1], which satisfies −1 K 2 (x) dx < ∞. Let now m = m(n) −→ ∞
n→∞
be an arbitrary sequence of integers with m/n −→ 0 and put
K(j/m)
ajn := Pm , −m ≤ j ≤ m. (6.13)
i=−m K(i/m)
These weights satisfy the conditions (6.11) (Exercise 6.18).
Take for example K(x) := 1 − |x|, −1 ≤ x ≤ 1. Then we obtain
m − |j|
ajn = , −m ≤ j ≤ m.
m2
Example 6.2.6. (i) The truncated kernel is defined by
(
1, |x| ≤ 1
KT (x) =
0 elsewhere.
(ii) The Bartlett or triangular kernel is given by

(
1 − |x|, |x| ≤ 1
KB (x) :=
0 elsewhere.
(iii) The Blackman–Tukey kernel (1949) is defined by

(
1 − 2a + 2a cos(x), |x| ≤ 1
KBT (x) =
0 elsewhere,
where 0 < a ≤ 1/4. The particular choice a = 1/4 yields the
Tukey–Hanning kernel .
(iv) The Parzen kernel (1957) is given by

2 3
1 − 6|x| + 6|x| , |x| < 1/2

KP (x) := 2(1 − |x|)3 , 1/2 ≤ |x| ≤ 1

0 elsewhere.
We refer to Andrews (1991) for a discussion of these kernels.
Example 6.2.7. We consider realizations of the MA(1)-process Yt =
εt −0.6εt−1 with εt independent and standard normal for t = 1, . . . , n =
160. Example 5.3.2 implies that the process (Yt ) has the spectral
density f (λ) = 1 − 1.2 cos(2πλ) + 0.36. We estimate f (λ) by means
of the Tukey–Hanning kernel.
Plot 6.2.2: Discrete spectral average estimator (broken line) with

Blackman–Tukey kernel with parameters m = 10, a = 1/4 and un-
derlying spectral density f (λ) = 1 − 1.2 cos(2πλ) + 0.36 (solid line).
1 /* ma1_blackman_tukey.sas */
2 TITLE1 ’Spectral density and Blackman-Tukey estimator’;
3 TITLE2 ’of MA(1)-process’;
4
5 /* Generate MA(1)-process */
6 DATA data1;
7 DO t=0 TO 160;
8 e=RANNOR(1);
9 y=e-.6*LAG(e);
10 OUTPUT;
11 END;
12
13 /* Estimation of spectral density */
14 PROC SPECTRA DATA=data1(FIRSTOBS=2) S OUT=data2;
15 VAR y;
16 WEIGHTS TUKEY 10 0;
17 RUN;
18
19 /* Adjusting different definitions */
20 DATA data3;
21 SET data2;
23 s=S_01/2*4*CONSTANT(’PI’);
24
25 /* Compute underlying spectral density */
26 DATA data4;
27 DO l=0 TO .5 BY .01;
28 f=1-1.2*COS(2*CONSTANT(’PI’)*l)+.36;
29 OUTPUT;
30 END;
31
33 DATA data5;
34 MERGE data3(KEEP=s lambda) data4;
35
39 SYMBOL1 I=JOIN C=BLUE V=NONE L=1;
41
42 /* Plot underlying and estimated spectral density */
44 PLOT f*l=1 s*lambda=2 / OVERLAY VAXIS=AXIS1 HAXIS=AXIS2;
45 RUN; QUIT;
In the first DATA step the realizations of an m = 10. The second number after the TUKEY
MA(1)-process with the given parameters are option can be used to refine the choice of the
created. Thereby the function RANNOR, which bandwidth. Since this is not needed here it is
generates standard normally distributed data, set to 0.
and LAG, which accesses the value of e of the The next DATA step adjusts the differ-
preceding loop, are used. ent definitions of the spectral density used
As in Program 6.2.1 (sunspot dsae.sas) PROC here and by SAS (see Program 4.2.1,
SPECTRA computes the estimator of the spec- star periodogram.sas). The following DATA
tral density (after dropping the first observa- step generates the values of the underlying
tion) by the option S and stores them in data2. spectral density. These are merged with the
The weights used here come from the Tukey– values of the estimated spectral density and
Hanning kernel with a specified bandwidth of then displayed by PROC GPLOT.
Confidence Intervals for Spectral Densities

The random variables In ((k+j)/n)/f ((k+j)/n), 0 < k+j < n/2, will
by Remark 6.2.3 for large n approximately behave like independent
and standard exponential distributed random variables Xj . This sug-
gests that the distribution of the discrete spectral average estimator

X k + j
ˆ
f (k/n) = ajn In
n
|j|≤m
P
can be approximated by that of the weighted sum |j|≤m ajn Xj f ((k +
j)/n). Tukey (1949) showed that the distribution of this weighted sum
can in turn be approximated by that of cY with a suitably chosen
c > 0, where Y follows a gamma distribution with parameters p :=
ν/2 > 0 and b = 1/2 i.e.,
Z t
bp
P {Y ≤ t} = xp−1 exp(−bx) dx, t ≥ 0,
Γ(p) 0
R∞
where Γ(p) := 0 xp−1 exp(−x) dx denotes the gamma function. The
parameters ν and c are determined by the method of moments as
follows: ν and c are chosen such that cY has mean f (k/n) and its
variance equals the leading term of the variance expansion of fˆ(k/n)
in Theorem 6.2.4 (Exercise 6.21):
E(cY ) = cν = f (k/n),
X
Var(cY ) = 2c2 ν = f 2 (k/n) a2jn .
|j|≤m
The solutions are obviously

f (k/n) X 2
c= ajn
2
|j|≤m
and
2
ν=P 2 .
|j|≤m ajn
Note that the gamma distribution with parameters p = ν/2 and

b = 1/2 equals the χ2 -distribution with ν degrees of freedom if ν
is an integer. The number ν is, therefore, called the equivalent de-
gree of freedom. Observe that ν/f (k/n) = 1/c; the random vari-
able ν fˆ(k/n)/f (k/n) = fˆ(k/n)/c now approximately follows a χ2 (ν)-
distribution with the convention that χ2 (ν) is the gamma distribution
with parameters p = ν/2 and b = 1/2 if ν is not an integer. The in-

terval !
ˆ ˆ
ν f (k/n) ν f (k/n)
, (6.14)
χ21−α/2 (ν) χ2α/2 (ν)
is a confidence interval for f (k/n) of approximate level 1 − α, α ∈
(0, 1). By χ2q (ν) we denote the q-quantile of the χ2 (ν)-distribution
i.e., P {Y ≤ χ2q (ν)} = q, 0 < q < 1. Taking logarithms in (6.14), we
obtain the confidence interval for log(f (k/n))

Cν,α (k/n) := log(fˆ(k/n)) + log(ν) − log(χ21−α/2 (ν)),

ˆ 2
log(f (k/n)) + log(ν) − log(χα/2 (ν)) .
This interval has constant length log(χ21−α/2 (ν)/χ2α/2 (ν)). Note that
Cν,α (k/n) is a level (1 − α)-confidence interval only for log(f (λ)) at
a fixed Fourier frequency λ = k/n, with 0 < k < [n/2], but not
simultaneously for λ ∈ (0, 0.5).
Example 6.2.8. In continuation of Example 6.2.7 we want to esti-

mate the spectral density f (λ) = 1−1.2 cos(2πλ)+0.36 of the MA(1)-
process Yt = εt − 0.6εt−1 using the discrete spectral average estimator
fˆn (λ) with the weights 1, 3, 6, 9, 12, 15, 18, 20, 21, 21, 21, 20, 18,
15, 12, 9, 6, 3, 1, each divided by 231. These weights are generated
by iterating simple moving averages of lengths 3, 7 and 11. Plot 6.2.3
displays the logarithms of the estimates, of the true spectral density
and the pertaining confidence intervals.
Plot 6.2.3: Logarithms of discrete spectral average estimates (broken

line), of spectral density f (λ) = 1 − 1.2 cos(2πλ) + 0.36 (solid line) of
MA(1)-process Yt = εt − 0.6εt−1 , t = 1, . . . , n = 160, and confidence
intervals of level 1 − α = 0.95 for log(f (k/n)).
1 /* ma1_logdsae.sas */
2 TITLE1 ’Logarithms of spectral density,’;
3 TITLE2 ’of their estimates and confidence intervals’;
4 TITLE3 ’of MA(1)-process’;
5
6 /* Generate MA(1)-process */
7 DATA data1;
8 DO t=0 TO 160;
9 e=RANNOR(1);
10 y=e-.6*LAG(e);
11 OUTPUT;
12 END;
13
14 /* Estimation of spectral density */
15 PROC SPECTRA DATA=data1(FIRSTOBS=2) S OUT=data2;
16 VAR y; WEIGHTS 1 3 6 9 12 15 18 20 21 21 21 20 18 15 12 9 6 3 1;
17 RUN;
18
19 /* Adjusting different definitions and computation of confidence bands
,→ */
20 DATA data3; SET data2;
22 log_s_01=LOG(S_01/2*4*CONSTANT(’PI’));
23 nu=2/(3763/53361);
24 c1=log_s_01+LOG(nu)-LOG(CINV(.975,nu));
25 c2=log_s_01+LOG(nu)-LOG(CINV(.025,nu));
26
27 /* Compute underlying spectral density */
28 DATA data4;
29 DO l=0 TO .5 BY 0.01;
30 log_f=LOG((1-1.2*COS(2*CONSTANT(’PI’)*l)+.36));
31 OUTPUT;
32 END;
33
35 DATA data5;
36 MERGE data3(KEEP=log_s_01 lambda c1 c2) data4;
37
41 SYMBOL1 I=JOIN C=BLUE V=NONE L=1;
43 SYMBOL3 I=JOIN C=GREEN V=NONE L=33;
44
45 /* Plot underlying and estimated spectral density */
47 PLOT log_f*l=1 log_s_01*lambda=2 c1*lambda=3 c2*lambda=3 / OVERLAY
,→VAXIS=AXIS1 HAXIS=AXIS2;
48 RUN; QUIT;
This program starts identically to Program 6.2.2 lated with the help of the function CINV which
(ma1 blackman tukey.sas) with the generation returns quantiles of a χ2 -distribution with ν de-
of an MA(1)-process and of the computation grees of freedom.
the spectral density estimator. Only this time
the weights are directly given to SAS. The rest of the program which displays the
In the next DATA step the usual adjustment of logarithm of the estimated spectral density,
the frequencies is done. This is followed by the of the underlying density and of the confi-
computation of ν according to its definition. The dence intervals is analogous to Program 6.2.2
logarithm of the confidence intervals is calcu- (ma1 blackman tukey.sas).
Exercises
6.1. For independent random variables X, Y having continuous dis-
tribution functions it follows that P {X = Y } = 0. Hint: Fubini’s
theorem.
6.2. Let X1 , . . . , Xn be iid random variables with values in R and

distribution function F . Denote by X1:n ≤ · · · ≤ Xn:n the pertaining
Exercises 217
order statistics. Then we have

n
X n
P {Xk:n ≤ t} = F (t)j (1 − F (t))n−j , t ∈ R.
j
j=k
The maximum Xn:n has in particular the distribution function F n ,

n
and the minimum function 1 − (1 − F ) . Hint:
PnX1:n has distribution
{Xk:n ≤ t} = j=1 1(−∞,t] (Xj ) ≥ k .
6.3. Suppose in addition to the conditions in Exercise 6.2 that F has

a (Lebesgue) density f . The ordered vector (X1:n , . . . , Xn:n ) then has
a density
n
Y
fn (x1 , . . . , xn ) = n! f (xj ), x1 < · · · < xn ,
j=1
and zero elsewhere. Hint: Denote by Sn the group of permutations

of {1, . . . , n} i.e., (τ (1), . . . , (τ (n)) with τ ∈ Sn is a permutation of
(1, . . . , n). Put for τ ∈ Sn the set PBτ := {Xτ (1) < · · · < Xτ (n) }. These
sets are disjoint and we have P ( τ ∈Sn Bτ ) = 1 since P {Xj = Xk } = 0
for i 6= j (cf. Exercise 6.1).
6.4. Let X and Y be independent, standard normalpdistributed

random variables. Show that (X, Z)T := (X, ρX + 1 − ρ2 Y )T ,
−1 < ρ < 1, is normal
distributed with mean vector (0, 0) and co-
1 ρ
variance matrix , and that X and Z are independent if and
ρ 1
only if they are uncorrelated (i.e., ρ = 0).
Suppose that X and Y are normal distributed and uncorrelated.

Does this imply the independence of X and Y ? Hint: Let X N (0, 1)-
distributed and define the random variable Y = V X with V inde-
pendent of X and P {V = −1} = 1/2 = P {V = 1}.
6.5. Generate 100 independent and standard normal random variables

εt and plot the periodogram. Is the hypothesis that the observations
were generated by a white noise rejected at level α = 0.05(0.01)? Visu-
alize the Bartlett-Kolmogorov-Smirnov test by plotting the empirical
distribution function of the cumulated periodograms Sj , 1 ≤ j ≤ 48,

together with the pertaining bands for α = 0.05 and α = 0.01
cα
y =x± √ , x ∈ (0, 1).
m−1
6.6. Generate the values

1
Yt = cos 2π t + εt , t = 1, . . . , 300,
6
where εt are independent and standard normal. Plot the data and the
periodogram. Is the hypothesis Yt = εt rejected at level α = 0.01?
6.7. (Share Data) Test the hypothesis that the share data were gener-
ated by independent and identically normal distributed random vari-
ables and plot the periodogramm. Plot also the original data.
6.8. (Kronecker’s lemma) Let (aj )j≥0 bePan absolute summable com-
plexed valued filter. Show that limn→∞ nj=0 (j/n)|aj | = 0.
6.9. The normal distribution N (0, σ 2 ) satisfies
(i) x2k+1 dN (0, σ 2 )(x) = 0, k ∈ N ∪ {0}.

R
R
(ii) x2k dN (0, σ 2 )(x) = 1 · 3 · · · · · (2k − 1)σ 2k , k ∈ N.
k+1
(iii) |x|2k+1 dN (0, σ 2 )(x) = 2√2π k!σ 2k+1 , k ∈ N ∪ {0}.
R
6.10. Show that a χ2 (ν)-distributed random variable satisfies E(Y ) =

ν and Var(Y ) = 2ν. Hint: Exercise 6.9.
6.11. (Slutzky’s lemma) Let X, Xn , n ∈ N, be random variables

in R with distribution functions FX and FXn , respectively. Suppose
that Xn converges in distribution to X (denoted by Xn →D X) i.e.,
FXn (t) → FX (t) for every continuity point of FX as n → ∞. Let
Yn , n ∈ N, be another sequence of random variables which converges
stochastically to some constant c ∈ R, i.e., limn→∞ P {|Yn −c| > ε} = 0
for arbitrary ε > 0. This implies
(i) Xn + Yn →D X + c.
Exercises 219
(ii) Xn Yn →D cX.
(iii) Xn /Yn →D X/c, if c 6= 0.
This entails in particular that stochastic convergence implies conver-
gence in distribution. The reverse implication is not true in general.
Give an example.
6.12. Show that the distribution function Fm of Fisher’s test statistic
κm satisfies under the condition of independent and identically normal
observations εt
m→∞
Fm (x + ln(m)) = P {κm ≤ x + ln(m)} −→ exp(−e−x ) =: G(x), x ∈ R.
The limiting distribution G is known as the Gumbel distribution.

Hence we have P {κm > x} = 1 − Fm (x) ≈ 1 − exp(−m e−x ). Hint:
Exercise 6.2 and 6.11.
6.13. Which effect has an outlier on the periodogram? Check this for
the simple model (Yt )t,...,n (t0 ∈ {1, . . . , n})
(
εt , t 6= t0
Yt =
εt + c, t = t0 ,
where the εt are independent and identically normal N (0, σ 2 ) distri-

buted and c 6= 0 is an arbitrary constant. Show to this end
E(IY (k/n)) = E(Iε (k/n)) + c2 /n

Var(IY (k/n)) = Var(Iε (k/n)) + 2c2 σ 2 /n, k = 1, . . . , [(n − 1)/2].
6.14. Suppose that U1 , . . . , Un are uniformly distributed on (0, 1) and

let F̂n denote the pertaining empirical distribution function. Show
that
( )
n (k − 1) k o
sup |F̂n (x) − x| = max max Uk:n − , − Uk:n .
0≤x≤1 1≤k≤n n n
6.15. (Monte Carlo Simulation) For m large we have under the hy-
pothesis √
P { m − 1∆m−1 > cα } ≈ α.
For
√ different values of m (> 30) generate 1000 times the test statistic
m − 1∆m−1 based on independent random variables and check, how
often this statistic exceeds the critical values c0.05 = 1.36 and c0.01 =
1.63. Hint: Exercise 6.14.
6.16. In the situation of Theorem 6.2.4 show that the spectral density
f of (Yt )t is continuous.
6.17. Complete the proof of Theorem 6.2.4 (ii) for the remaining cases
λ = µ = 0 and λ = µ = 0.5.
6.18. Verify that the weights (6.13) defined via a kernel function sat-
isfy the conditions (6.11).
6.19. Use the IML function ARMASIM to simulate the process
Yt = 0.3Yt−1 + εt − 0.5εt−1 , 1 ≤ t ≤ 150,
where εt are independent and standard normal. Plot the periodogram

and estimates of the log spectral density together with confidence
intervals. Compare the estimates with the log spectral density of
(Yt )t∈Z .
6.20. Compute the distribution of the periodogram Iε (1/2) for inde-

pendent and identically normal N (0, σ 2 )-distributed random variables
ε1 , . . . , εn in case of an even sample size n.
6.21. Suppose that Y follows a gamma distribution with parameters

p and b. Calculate the mean and the variance of Y .
6.22. Compute the length of the confidence interval Cν,α (k/n) for
fixed α (preferably α = 0.05) but for various ν. For the calculation of
ν use the weights generated by the kernel K(x) = 1 − |x|, −1 ≤ x ≤ 1
(see equation (6.13)).
6.23. Show that for y ∈ R

n−1
1 X i2πyt 2 X |t| i2πyt
e = 1− e = Kn (y),
n t=0 n

|t|<n
Exercises 221
where 
n, y∈Z
Kn (y) = 1 sin(πyn) 2

 n sin(πy) , y ∈
/ Z,
is the Fejer kernel of order n. Verify that it has the properties
(i) Kn (y) ≥ 0,
(ii) the Fejer kernel is a periodic function of period length one,
(iii) Kn (y) = Kn (−y),

R 0.5
(iv) −0.5 Kn (y) dy = 1,
Rδ n→∞
(v) −δ Kn (y) dy −→ 1, δ > 0.
6.24. (Nile Data) Between 715 and 1284 the river Nile had its lowest
annual minimum levels. These data are among the longest time series
in hydrology. Can the trend removed Nile Data be considered as being
generated by a white noise, or are these hidden periodicities? Estimate
the spectral density in this case. Use discrete spectral estimators as
well as lag window spectral density estimators. Compare with the
spectral density of an AR(1)-process.
Chapter
The Box–Jenkins
Program: A Case Study
This chapter deals with the practical application of the Box–Jenkins
7
Program to the Donauwoerth Data, consisting of 7300 discharge mea-
surements from the Donau river at Donauwoerth, specified in cubic
centimeter per second and taken on behalf of the Bavarian State Of-
fice For Environment between January 1st, 1985 and December 31st,
2004. For the purpose of studying, the data have been kindly made
available to the University of Würzburg.
As introduced in Section 2.3, the Box–Jenkins methodology can be
applied to specify adequate ARMA(p, q)-model Yt = a1 Yt−1 + · · · +
ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q , t ∈ Z for the Donauwoerth data
in order to forecast future values of the time series. In short, the
original time series will be adjusted to represent a possible realization
of such a model. Based on the identification methods MINIC, SCAN
and ESACF, appropriate pairs of orders (p, q) are chosen in Section
7.5 and the corresponding model coefficients a1 , . . . , ap and b1 , . . . , bq
are determined. Finally, it is demonstrated by Diagnostic Checking in
Section 7.6 that the resulting model is adequate and forecasts based
on this model are executed in the concluding Section 7.7.
Yet, before starting the program in Section 7.4, some theoretical
preparations have to be carried out. In the first Section 7.1 we intro-
duce the general definition of the partial autocorrelation leading to
the Levinson–Durbin-Algorithm, which will be needed for Diagnostic
Checking. In order to verify whether pure AR(p)- or MA(q)-models
might be appropriate candidates for explaining the sample, we de-
rive in Section 7.2 and 7.3 asymptotic normal behaviors of suitable
estimators of the partial and general autocorrelations.
224 The Box–Jenkins Program: A Case Study
7.1 Partial Correlation and Levinson–

Durbin Recursion
In general, the correlation between two random variables is often due
to the fact that both variables are correlated with other variables.
Therefore, the linear influence of these variables is removed to receive
the partial correlation.
Partial Correlation
The partial correlation of two square integrable, real valued random
variables X and Y , holding the random variables Z1 , . . . , Zm , m ∈ N,
fixed, is defined as
Corr(X − X̂Z1 ,...,Zm , Y − ŶZ1 ,...,Zm )

Cov(X − X̂Z1 ,...,Zm , Y − ŶZ1 ,...,Zm )
= ,
(Var(X − X̂Z1 ,...,Zm ))1/2 (Var(Y − ŶZ1 ,...,Zm ))1/2
provided that Var(X − X̂Z1 ,...,Zm ) > 0 and Var(Y − ŶZ1 ,...,Zm ) > 0,
where X̂Z1 ,...,Zm and ŶZ1 ,...,Zm denote best linear approximations of X
and Y based on Z1 , . . . , Zm , respectively.
Let (Yt )t∈Z be an ARMA(p, q)-process satisfying the stationarity con-
dition (2.4) with expectation E(Yt ) = 0 and variance γ(0) > 0. The
partial autocorrelation α(t, k) for k > 1 is the partial correlation
of Yt and Yt−k , where the linear influence of the intermediate vari-
ables Yi , t − k < i < t, is removed, i.e., the best linear approxima-
tion of Yt based Pk−1on the k − 1 preceding process variables, denoted
by Ŷt,k−1 := i=1 âi Yt−i . Since Ŷt,k−1 minimizes the mean squared
error E((Yt − Ỹt,a,k−1 )2 ) among all linear combinations Ỹt,a,k−1 :=
Pk−1 T k−1
i=1 ai Yt−i , a := (a1 , . . . , ak−1 ) ∈ R of Yt−k+1 , . . . , Yt−1 , we find,
Pk−1
due to the stationarity condition, that Ŷt−k,−k+1 := i=1 âi Yt−k+i is
a best linear approximation of Yt−k based on the k − 1 subsequent
process variables. Setting Ŷt,0 = 0 in the case of k = 1, we obtain, for
7.1 Partial Correlation and Levinson–Durbin Recursion 225
k > 0,
α(t, k) := Corr(Yt − Ŷt,k−1 , Yt−k − Ŷt−k,−k+1 )
Cov(Yt − Ŷt,k−1 , Yt−k − Ŷt−k,−k+1 )
= .
Var(Yt − Ŷt,k−1 )
Note that Var(Yt − Ŷt,k−1 ) > 0 is provided by the preliminary con-
ditions, which will be shown later Pby the proof of Theorem 7.1.1.
k−1
Observe moreover that Ŷt+h,k−1 = i=1 âi Yt+h−i for all h ∈ Z, imply-
ing α(t, k) = Corr(Yt − Ŷt,k−1 , Yt−k − Ŷt−k,−k+1 ) = Corr(Yk − Ŷk,k−1 , Y0 −
Ŷ0,−k+1 ) = α(k, k). Consequently the partial autocorrelation function
can be more conveniently written as
Cov(Yk − Ŷk,k−1 , Y0 − Ŷ0,−k+1 )
α(k) := (7.1)
Var(Yk − Ŷk,k−1 )
for k > 0 and α(0) = 1 for k = 0. For negative k, we set α(k) :=
α(−k).
The determination of the partial autocorrelation coefficient α(k) at lag
k > 1 entails the computation of the coefficients of the correspond-
ing best linear approximation Ŷk,k−1 , leading to an equation system
similar to the normal equations coming from a regression model. Let
Ỹk,a,k−1 = a1 Yk−1 + · · · + ak−1 Y1 be an arbitrary linear approximation
of Yk based on Y1 , . . . , Yk−1 . Then, the mean squared error is given by
E((Yk − Ỹk,a,k−1 )2 ) = E(Yk2 ) − 2 E(Yk Ỹk,a,k−1 ) + E(Ỹk,a,k−1
2
)
k−1
X
= E(Yk2 ) −2 al E(Yk Yk−l )
l=1
k−1 X
X k−1
+ E(ai aj Yk−i Yk−j )
i=1 j=1
Computing the partial derivatives and equating them to zero yields

2â1 E(Yk−1 Yk−1 ) + · · · + 2âk−1 E(Yk−1 Y1 ) − 2 E(Yk−1 Yk ) = 0,
2â1 E(Yk−2 Yk−1 ) + · · · + 2âk−1 E(Yk−2 Y1 ) − 2 E(Yk−2 Yk ) = 0,
..
.
2â1 E(Y1 Yk−1 ) + · · · + 2âk−1 E(Y1 Y1 ) − 2 E(Y1 Yk ) = 0.
Or, equivalently, written in matrix notation

Vyy â = VyYk
with vector of coefficients â := (â1 , . . . , âk−1 )T , y := (Yk−1 , . . . , Y1 )T

and matrix representation

Vwz := E(Wi Zj ) ,
ij
where w := (W1 , . . . , Wn )T , z := (Z1 , . . . , Zm )T , n, m ∈ N, denote

random vectors. So, if the matrix Vyy is regular, we attain a uniquely
determined solution of the equation system
−1
â = Vyy VyYk .
Since the ARMA(p, q)-process (Yt )t∈Z has expectation zero, above
representations divided by the variance γ(0) > 0 carry over to the
Yule-Walker equations of order k − 1 as introduced in Section 2.2
   
â1 ρ(1)
 â2   ρ(2) 
 ...  = 
Pk−1    .. , (7.2)
. 
âk−1 ρ(k − 1)
or, respectively,
   
â1 ρ(1)
 â2 
 .  = P −1  ρ(2)
 
 ..  k−1  ..  (7.3)
. 
âk−1 ρ(k − 1)
−1
if Pk−1 is regular. A best linear approximation Ŷk,k−1 of Yk based
on Y1 , . . . , Yk−1 obviously has to share this necessary condition (7.2).
Since Ŷk,k−1 equals a best linear one-step forecast of Yk based on
Y1 , . . . , Yk−1 , Lemma 2.3.2 shows that Ŷk,k−1 = âT y is a best linear
−1
approximation of Yk . Thus, if Pk−1 is regular, then Ŷk,k−1 is given by
−1
Ŷk,k−1 = âT y = VYk y Vyy y. (7.4)
The next Theorem will now establish the necessary regularity of Pk .
Theorem 7.1.1. If (Yt )t∈Z is an ARMA(p, q)-process satisfying the

stationarity condition with E(Yt ) = 0, Var(Yt ) > 0 and Var(εt ) = σ 2 ,
then, the covariance matrix Σk of Yt,k := (Yt+1 , . . . , Yt+k )T is regular
for every k > 0.
P
Proof. Let Yt = v≥0 αv εt−v , t ∈ Z, be the almost surely stationary
solution of (Yt )t∈Z with absolutely summable filter (αv )v≥0 . The au-
tocovariance function γ(k) of (Yt )t∈Z is absolutely summable as well,
since
X X X
|γ(k)| < 2 |γ(k)| = 2 | E(Yt Yt+k )|
k∈Z k≥0 k≥0
X X X
=2 αu αv E(εt−u εt+k−v )

k≥0 u≥0 v≥0
X X
2
= 2σ αu αu+k

k≥0 u≥0
X X
2
≤ 2σ |αu | |αw | < ∞.
u≥0 w≥0
Suppose now that Σk is singular for a k > 1. Then, there exists

a maximum integer κ ≥ 1 such that Σκ is regular. Thus, Σκ+1
is singular. This entails in particular the existence of a non-zero
vector λκ+1 := (λ1 , . . . , λκ+1 )T ∈ Rκ+1 with λTκ+1 Σκ+1 λκ+1 = 0.
Therefore, Var(λTκ+1 Yt,κ+1 ) = 0 since the process has expectation
zero. Consequently, λTκ+1 Yt,κ+1 is a constant. It has expectation
zero and is therefore constant zero. Hence, Yt+κ+1 = κi=1 λκ+1 λi
P
Yt+i
for each t ∈ Z, which implies the existence of a non-zero vector
(t) (t) (t) (t) (t)T
λκ := (λ1 , . . . , λκ )T ∈ Rκ with Yt+κ+1 = κi=1 λi Yi = λκ Y0,κ .
P
Note that λκ+1 6= 0, since otherwise Σκ would be singular. Because of
the regularity of Σκ , we find a diagonal matrix Dκ and an orthogonal
matrix Sκ such that
γ(0) = Var(Yt+κ+1 ) = λκ(t)T Σκ λ(t) (t)T T (t)
κ = λκ Sκ Dκ Sκ λκ .
The diagonal elements of Dκ are the positive eigenvalues of Σκ , whose

smallest eigenvalue will be denoted by µκ,min . Hence,
κ
(t)2
X
∞ > γ(0) ≥ µκ,min λ(t)T T (t)
κ Sκ Sκ λκ = µκ,min λi .
i=1
(t)
This shows the boundedness of λi for a fixed i. On the other hand,
κ κ
(t) (t)
X X
γ(0) = Cov Yt+κ+1 , λi Yi ≤ |λi ||γ(t + κ + 1 − i)|.
i=1 i=1
Since γ(0) > 0 and γ(t+κ+1−i) → 0 as t → ∞, due to the absolutely

summability of γ(k), this inequality would produce a contradiction.
Consequently, Σk is regular for all k > 0.
The Levinson–Durbin Recursion

The power of an approximation Ŷk of Yk can be measured by the part
of unexplained variance of the residuals. It is defined as follows:
P (k − 1) := , (7.5)
Var(Yk )
if Var(Yk ) > 0. Observe that the greater the power, the less precise
the approximation Ŷk performs. Let again (Yt )t∈Z be an ARMA(p, q)-
process satisfying the stationarity condition with expectation E(Yt ) =
0 and variance Var(Yt ) > 0. Furthermore, let Ŷk,k−1 := k−1
P
u=1 âu (k −
1)Yk−u denote the best linear approximation of Yk based on the k − 1
preceding random variables for k > 1. Then, equation (7.4) and
Theorem 7.1.1 provide
−1
Var(Yk − Ŷk,k−1 ) Var(Yk − VYk y Vyy y)
P (k − 1) := =
Var(Yk ) Var(Yk )
1
−1 −1

= Var(Yk ) + Var(VYk y Vyy y) − 2 E(Yk VYk y Vyy y)
Var(Yk )
1 −1 −1 −1

= γ(0) + VYk y Vyy Vyy Vyy VyYk − 2VYk y Vyy VyYk
γ(0)
k−1
1 X
= γ(0) − VYk y â(k − 1) = 1 − ρ(i)âi (k − 1) (7.6)
γ(0) i=1
where â(k − 1) := (â1 (k − 1), â2 (k − 1), . . . , âk−1 (k − 1))T . Note that
P (k − 1) 6= 0 is provided by the proof of the previous Theorem 7.1.1.
In order to determine the coefficients of the best linear approximation

Ŷk,k−1 we have to solve (7.3). A well-known algorithm simplifies this
task by evading the computation of the inverse matrix.
Theorem 7.1.2. (The Levinson–Durbin Recursion) Let (Yt )t∈Z be an

ARMA(p, q)-process satisfying the stationarity condition with expec-
Pk−1
tation E(Yt ) = 0 and variance Var(Yt ) > 0. Let Ŷk,k−1 = u=1 âu (k −
1)Yk−u denote the best linear approximation of Yk based on the k − 1
preceding process values Yk−1 , . . . , Y1 for k > 1. Then,
(i) âk (k) = ω(k)/P (k − 1),
(ii) âi (k) = âi (k − 1) − âk (k)âk−i (k − 1) for i = 1, . . . , k − 1, as well

as
(iii) P (k) = P (k − 1)(1 − â2k (k)),
where P (k − 1) = 1 − k−1
P
i=1 ρ(i)âi (k − 1) denotes the power of approx-
imation and where
ω(k) := ρ(k) − â1 (k − 1)ρ(k − 1) − · · · − âk−1 (k − 1)ρ(1).
Proof. We carry out two best linear approximations of Yt , Ŷt,k−1 :=

Pk−1 Pk
â
u=1 u (k − 1)Yt−u and Ŷt,k := u=1 âu (k)Yt−u , which lead to the
Yule-Walker equations (7.3) of order k − 1
 1 ρ(1) ... ρ(k−2)
  â (k−1)   ρ(1) 
1
ρ(1) 1 ρ(k−3) â2 (k−1) ρ(2)
ρ(2) ρ(1) ρ(k−4)   â3 (k−1) ρ(3)
= (7.7)
   
 .. ... ..  .. .. 
. . . .
ρ(k−2) ρ(k−3) ... 1 âk−1 (k−1) ρ(k−1)
and order k
1 ρ(1) ρ(2) ... ρ(k−1) â1 (k)
    ρ(1) 
ρ(1) 1 ρ(1) ρ(k−2) â2 (k) ρ(2)
ρ(2) ρ(1) 1 ρ(k−3)   â3 (k) 
  .  =  ρ(3) .
  
 .. ... .. .. .. 
. . .
ρ(k−1) ρ(k−2) ρ(k−3) ... 1 âk (k) ρ(k)
Latter equation system can be written as

 1 ρ(1) ρ(2) ... ρ(k−2)
 â1 (k)

ρ(1) 1 ρ(1) ρ(k−3) â2 (k)
 ρ(2) ρ(1) 1 ρ(k−4)   â3 (k) 
 .. ... ..  .. 
. . .
ρ(k−2) ρ(k−3) ρ(k−4) ... 1 â (k)
 ρ(k−1)   ρ(1)
 k−1
ρ(k−2) ρ(2)
 ρ(k−3)  ρ(3)
+ âk (k)  =
 
..  .. 
. .
ρ(1) ρ(k−1)
and
ρ(k − 1)â1 (k) + ρ(k − 2)â2 (k) + · · · + ρ(1)âk−1 (k) + âk (k) = ρ(k).
(7.8)
From (7.7), we obtain

 1 ρ(1) ρ(2) ... ρ(k−2) â1 (k)
 
ρ(1) 1 ρ(1) ρ(k−3) â2 (k)
 ρ(2) ρ(1) 1 ρ(k−4)   â3 (k) 
 .. ... ..  .. 
. . .
ρ(k−2) ρ(k−3) ρ(k−4) ... 1 âk−1 (k)
 1 ρ(1) ρ(2) ... ρ(k−2)
  â (k−1) 
k−1
ρ(1) 1 ρ(1) ρ(k−3) âk−2 (k−1)
 ρ(2) ρ(1) 1 ρ(k−4)   âk−3 (k−1) 
+ âk (k) 
.. ..   .. ...

. . .
ρ(k−2) ρ(k−3) ρ(k−4) ... 1 â1 (k−1)
1 ρ(1) ρ(2) ... ρ(k−2)
   â (k−1) 
1
ρ(1) 1 ρ(1) ρ(k−3) â2 (k−1)
=
 ρ(2) ρ(1) 1 ρ(k−4)   â3 (k−1) 
.. ... ..  .. 
. . .
ρ(k−2) ρ(k−3) ρ(k−4) ... 1 âk−1 (k−1)
−1
Multiplying Pk−1 to the left we get
â1 (k) â1 (k−1)
 â
k−1 (k−1)
    
â2 (k) â2 (k−1) â (k−1)
 âk−2
â3 (k)
= â3 (k−1)
 − âk (k)  k−3 (k−1) .
    
 .. .. ..
. . .
âk−1 (k) âk−1 (k−1) â1 (k−1)
This is the central recursion equation system (ii). Applying the corre-
sponding terms of â1 (k), . . . , âk−1 (k) from the central recursion equa-
tion to the remaining equation (7.8) yields

âk (k)(1 − â1 (k − 1)ρ(1) − â2 (k − 1)ρ(2) − . . . âk−1 (k − 1)ρ(k − 1))
= ρ(k) − (â1 (k − 1)ρ(k − 1) + · · · + âk−1 (k − 1)ρ(1))
or, respectively,
âk (k)P (k − 1) = ω(k)
which proves (i). From (i) and (ii), it follows (iii),
P (k) = 1 − â1 (k)ρ(1) − · · · − âk (k)ρ(k)
= 1 − â1 (k − 1)ρ(1) − · · · − âk−1 (k − 1)ρ(k − 1) − âk (k)ρ(k)
+ âk (k)ρ(1)âk−1 (k − 1) + · · · + âk (k)ρ(k − 1)â1 (k − 1)
= P (k − 1) − âk (k)ω(k)
= P (k − 1)(1 − âk (k)2 ).
We have already observed several similarities between the general defi-

nition of the partial autocorrelation (7.1) and the definition in Section
2.2. The following theorem will now establish the equivalence of both
definitions in the case of zero-mean ARMA(p, q)-processes satisfying
the stationarity condition and Var(Yt ) > 0.
Theorem 7.1.3. The partial autocorrelation α(k) of an ARMA(p, q)-
process (Yt )t∈Z satisfying the stationarity condition with expectation
E(Yt ) = 0 and variance Var(Yt ) > 0 equals, for k > 1, the coefficient
âk (k) in the best linear approximation Ŷk+1,k = â1 (k)Yk + · · · + âk (k)Y1
of Yk+1 .
Proof. Applying the definition of α(k), we find, for k > 1,
Cov(Yk − Ŷk,k−1 , Y0 − Ŷ0,−k+1 )

α(k) =
Cov(Yk − Ŷk,k−1 , Y0 ) + Cov(Yk − Ŷk,k−1 , −Ŷ0,−k+1 )
=
Cov(Yk − Ŷk,k−1 , Y0 )
= .
The last step follows from (7.2), which implies that Yk − Ŷk,k−1 is
uncorrelated with Y1 , . . . , Yk−1 . Setting Ŷk,k−1 = k−1
P
u=1 âu (k − 1)Yk−u ,
we attain for the numerator
Cov(Yk − Ŷk,k−1 , Y0 ) = Cov(Yk , Y0 ) − Cov(Ŷk,k−1 , Y0 )
k−1
X
= γ(k) − âu (k − 1) Cov(Yk−u , Y0 )
u=1
k−1
X
= γ(0)(ρ(k) − âu (k − 1)ρ(k − u)).
u=1
Now, applying the first formula of the Levinson–Durbin Theorem 7.1.2

finally leads to
γ(0)(ρ(k) − k−1
P
u=1 âu (k − 1)ρ(k − u))
α(k) =
γ(0)P (k − 1)
γ(0)P (k − 1)âk (k)
=
γ(0)P (k − 1)
= âk (k).
Partial Correlation Matrix

We are now in position to extend the previous definitions to random
vectors. Let y := (Y1 , . . . , Yk )T and x := (X1 , . . . , Xm )T be in the
following random vectors of square integrable, real valued random
variables. The partial covariance matrix of y with X1 , . . . , Xm being
held fix is defined as

Vyy,x := Cov(Yi − Ŷi,x , Yj − Ŷj,x ) ,
1≤i,j≤k
where Ŷi,x denotes the best linear approximation of Yi,x based on

X1 , . . . , Xm for i = 1, . . . , k. The partial correlation matrix of y,
where the linear influence of X1 , . . . , Xm is removed, is accordingly
defined by

Ryy,x := Corr(Yi − Ŷi,x , Yj − Ŷj,x ) ,
1≤i,j≤k
if Var(Yi − Ŷi,x ) > 0 for each i ∈ {1, . . . , k}.

Lemma 7.1.4. Consider above notations with E(y) = 0 as well as
E(x) = 0, where 0 denotes the zero vector in Rk and Rm , respectively.
Let furthermore Vwz denote the covariance matrix of two random vec-
tors w and z. Then, if Vxx is regular, the partial covariance matrix
Vyy,x satisfies
−1
Vyy,x = Vyy − Vyx Vxx Vxy .
Proof. For arbitrary i, j ∈ {1, . . . , k}, we get
Cov(Yi , Yj ) = Cov(Yi − Ŷi,x + Ŷi,x , Yj − Ŷj,x + Ŷj,x )
= Cov(Yi − Ŷi,x , Yj − Ŷj,x ) + Cov(Ŷi,x , Ŷj,x )
+ E(Ŷi,x (Yj − Ŷj,x )) + E(Ŷj,x (Yi − Ŷi,x ))
= Cov(Yi − Ŷi,x , Yj − Ŷj,x ) + Cov(Ŷi,x , Ŷj,x ) (7.9)
or, equivalently,
Vyy = Vyy,x + Vŷx ŷx ,
where ŷx := (Ŷ1,x , . . . , Ŷk,x )T . The last step in (7.9) follows from the
circumstance that the approximation errors (Yi − Ŷi,x ) of the best
linear approximation are uncorrelated with every linear combination
of X1 , . . . , Xm for every i = 1, . . . n, due to the partial derivatives
of the mean squared error being equal to zero as described at the
beginning of this section. Furthermore, (7.4) implies that
−1
ŷx = Vyx Vxx x.
Hence, we finally get
−1 −1
Vŷx ŷx = Vyx Vxx Vxx Vxx Vxy
−1
= Vyx Vxx Vxy ,
which completes the proof.
Lemma 7.1.5. Consider the notations and conditions of Lemma 7.1.4.
If Vyy,x is regular, then,
−1 −1 −1 −1 −1 −1 −1

Vxx Vxy Vxx + Vxx Vxy Vyy,x Vyx Vxx −Vxx Vxy Vyy,x
= −1 −1 −1 .
Vyx Vyy −Vyy,x Vyx Vxx Vyy,x
Proof. The assertion follows directly from the previous Lemma 7.1.4.
7.2 Asymptotic Normality of Partial Auto-

correlation Estimator
The following table comprises some behaviors of the partial and gen-
eral autocorrelation function with respect to the type of process.
process autocorrelation partial autocorrelation

type function function
MA(q) finite infinite
ρ(k) = 0 for k > q
AR(p) infinite finite
α(k) = 0 for k > p
ARMA(p, q) infinite infinite
Knowing these qualities of the true partial and general autocorre-

lation function we expect a similar behavior for the corresponding
empirical counterparts. In general, unlike the theoretic function, the
empirical partial autocorrelation computed from an outcome of an
AR(p)-process won’t vanish after lag p. Nevertheless the coefficients
after lag p will tend to be close to zero. Thus, if a given time series
was actually generated by an AR(p)-process, then the empirical par-
tial autocorrelation coefficients will lie close to zero after lag p. The
same is true for the empirical autocorrelation coefficients after lag q,
if the time series was actually generated by a MA(q)-process. In order
to justify AR(p)- or MA(q)-processes as appropriate model candidates
for explaining a given time series, we have to verify if the values are
significantly close to zero. To this end, we will derive asymptotic dis-
tributions of suitable estimators of these coefficients in the following
two sections.
Cramer–Wold Device
Definition 7.2.1. A sequence of real valued random variables (Yn )n∈N ,
defined on some probability space (Ω, A, P), is said to be asymptoti-
7.2 Asymptotic Normality of Partial Autocorrelation Estimator 235
cally normal with asymptotic mean µn and asymptotic variance σn2 >
D
0 for sufficiently large n, written as Yn ≈ N (µn , σn2 ), if σn−1 (Yn − µn ) →
Y as n → ∞, where Y is N(0,1)-distributed.
We want to extend asymptotic normality to random vectors, which

will be motivated by the Cramer–Wold Device 7.2.4. However, in
order to simplify proofs concerning convergence in distribution, we
first characterize convergence in distribution in terms of the sequence
of the corresponding characteristic functions.
Let Y be a real valued random k-vector with distribution function
F (x). The characteristic function of F (x) is then defined as the
Fourier transformation
Z
φY (t) := E(exp(itT Y )) = exp(itT x)dF (x)
Rk
for t ∈ Rk . In many cases of proving weak convergence of a sequence

D
of random k-vectors Yn → Y as n → ∞, it is often easier to show
the pointwise convergence of the corresponding sequence of charac-
teristic functions φYn (t) → φY (t) for every t ∈ Rk . The following two
Theorems will establish this equivalence.
Theorem 7.2.2. Let Y , Y1 , Y2 , Y3 , . . . be real valued random k-vectors,

defined on some probability space (Ω, A, P). Then, the following con-
ditions are equivalent
D
(i) Yn → Y as n → ∞,
(ii) E(ϑ(Yn )) → E(ϑ(Y )) as n → ∞ for all bounded and continuous

functions ϑ : Rk → R,
(iii) E(ϑ(Yn )) → E(ϑ(Y )) as n → ∞ for all bounded and uniformly

continuous functions ϑ : Rk → R,
(iv) lim supn→∞ P ({Yn ∈ C}) ≤ P ({Y ∈ C}) for any closed set
C ⊂ Rk ,
(v) lim inf n→∞ P ({Yn ∈ O}) ≥ P ({Y ∈ O}) for any open set O ⊂
Rk .
Proof. (i) ⇒ (ii): Suppose that F (x), Fn (x) denote the correspond-
ing distribution functions of real valued random k-vectors Y , Yn for
all n ∈ N, respectively, satisfying Fn (x) → F (x) as n → ∞ for ev-
ery continuity point x of F (x). Let ϑ : Rk → R be a bounded and
continuous function, which is obviously bounded by the finite value
B := supx {|ϑ(x)|}. Now, given ε > 0, we find, due to the right-
continuousness, continuity points ±C := ±(C1 , . . . , Ck )T of F (x),
with Cr 6= 0 for r = 1, . . . , k and a compact set K := {(x1 , . . . , xk ) :
−Cr ≤ xr ≤ Cr , r = 1, . . . , k} such that P {Y ∈ / K} < ε/B. Note
that this entails P {Yn ∈ / K} < 2ε/B for n sufficiently large. Now, we
choose l ≥ 2 continuity points xj := (xj1 , . . . , xjk )T , j ∈ {1, . . . , l},
of F (x) such that −Cr = x1r < · · · < xlr = Cr for each r ∈
{1, . . . , k} and such that supx∈K |ϑ(x) − ϕ(x)| < ε, where ϕ(x) :=
Pl−1
i=1 ϑ(xi )1(xi ,xi+1 ] (x). Then, we attain, for n sufficiently large,
| E(ϑ(Yn )) − E(ϑ(Y ))|

≤ | E(ϑ(Yn )1K (Yn )) − E(ϑ(Y )1K (Y ))|
+ E(|ϑ(Yn |))1K c (Yn ) + E(|ϑ(Y )|)1K c (Y )
< | E(ϑ(Yn )1K (Yn )) − E(ϑ(Y )1K (|Y |))| + B · 2ε/B + B · ε/B
≤ | E(ϑ(Yn )1K (Yn )) − E(ϕ(Yn ))| + | E(ϕ(Yn )) − E(ϕ(Y ))|
+ | E(ϕ(Y )) − E(ϑ(Y )1K (Y ))| + 3ε
< | E(ϕ(Yn )) − E(ϕ(Y ))| + 5ε,
where K c denotes the complement of K and where 1A (·) denotes the

indicator function of a set A, i.e., 1A (x) = 1 if x ∈ A and 1A (x) = 0
else. As ε > 0 was chosen arbitrarily, it remains to show E(ϕ(Yn )) →
E(ϕ(Y )) as n → ∞, which can be seen as follows:
l−1
X
E(ϕ(Yn )) = ϑ(xi )(Fn (xi+1 ) − Fn (xi ))
i=1
l−1
X
→ ϑ(xi )(F (xi+1 ) − F (xi )) = E(ϕ(Y ))
i=1
as n → ∞.
(ii) ⇒ (iv): Let C ⊂ Rk be a closed set. We define ψC (y) := inf{||y −
x|| : x ∈ C}, ψ : Rk → R, which is a continuous function, as well as
ξi (z) := 1(−∞,0] (z) + (1 − iz)1(0,i−1 ] (z), z ∈ R, for every i ∈ N. Hence

ϑi,C (y) := ξi (ψC (y)) is a continuous and bounded function with ϑi,C :
Rk → R for each i ∈ N. Observe moreover that ϑi,C (y) ≥ ϑi+1,C (y)
for each i ∈ N and each fixed y ∈ Rk as well as ϑi,C (y) → 1C (y) as
i → ∞ for each y ∈ Rk . From (ii), we consequently obtain
lim sup P ({Yn ∈ C}) ≤ lim E(ϑi,C (Yn )) = E(ϑi,C (Y ))

n→∞ n→∞
for each i ∈ N. The dominated convergence theorem finally provides

that E(ϑi,C (Y )) → E(1C (Y )) = P ({Y ∈ C}) as i → ∞.
(iv) and (v) are equivalent since the complement Oc of an open set
O ⊂ Rk is closed.
(iv), (v) ⇒ (i): (iv) and (v) imply that
P ({Y ∈ (−∞, y)}) ≤ lim inf P ({Yn ∈ (−∞, y)})

n→∞
≤ lim inf Fn (y) ≤ lim sup Fn (y)
n→∞ n→∞
= lim sup P ({Yn ∈ (−∞, y]})
n→∞
≤ P ({Y ∈ (−∞, y]}) = F (y). (7.10)
If y is a continuity point of F (x), i.e., P ({Y ∈ (−∞, y)}) = F (y),

then above inequality (7.10) shows limn→∞ Fn (y) = F (y).
(i) ⇔ (iii): (i) ⇒ (iii) is provided by (i) ⇒ (ii). Since ϑi,C (y) =
ξi (ψC (y)) from step (ii) ⇒ (iv) is moreover a uniformly continuous
function, the equivalence is shown.
Theorem 7.2.3. (Continuity Theorem) A sequence (Yn )n∈N of real
valued random k-vectors, all defined on a probability space (Ω, A, P),
converges in distribution to a random k-vector Y as n → ∞ iff the
corresponding sequence of characteristic functions (φYn (t))n∈N con-
verges pointwise to the characteristic function φY (t) of the distribution
function of Y for each t = (t1 , t2 , . . . , tk ) ∈ Rk .
D
Proof. ⇒: Suppose that (Yn )n∈N → Y as n → ∞. Since both the
real and imaginary part of φYn (t) = E(exp(itT Yn )) = E(cos(tT Yn )) +
i E(sin(tT Yn )) is a bounded and continuous function, the previous
Theorem 7.2.2 immediately provides that φYn (t) → φY (t) as n → ∞
for every fixed t ∈ Rk .
⇐: Let ϑ : Rk → R be a uniformly continuous and bounded function

with |ϑ| ≤ M, M ∈ R+ . Thus, for arbitrary ε > 0, we find δ > 0
such that |y − x| < δ ⇒ |ϑ(y) − ϑ(x)| < ε for all y, x ∈ Rk . With
Theorem 7.2.2 it suffices to show that E(ϑ(Yn )) → E(ϑ(Y )) as n →
∞. Consider therefore Yn + σX and Y + σX, where σ > 0 and X
is a k-dimensional standard normal vector independent of Y and Yn
for all n ∈ N. So, the triangle inequality provides
| E(ϑ(Yn )) − E(ϑ(Y ))| ≤| E(ϑ(Yn )) − E(ϑ(Yn + σX))|
+ | E(ϑ(Yn + σX)) − E(ϑ(Y + σX))|
+ | E(ϑ(Y + σX)) − E(ϑ(Y ))|, (7.11)
where the first term on the right-hand side satisfies, for σ sufficiently
small,
| E(ϑ(Yn )) − E(ϑ(Yn + σX))|
≤ E(|ϑ(Yn ) − ϑ(Yn + σX)|1[−δ,δ] (|σX|)
+ E(|ϑ(Yn ) − ϑ(Yn + σX)|1(−∞,−δ)∪(δ,∞) (|σX|)
< ε + 2M P {|σX| > δ} < 2ε.
Analogously, the third term on the right-hand side in (7.11) is bounded
by 2ε for sufficiently small σ. Hence, it remains to show that E(ϑ(Yn +
σX)) → E(ϑ(Y + σX)) as n → ∞. By Fubini’s Theorem and sub-
stituting z = y + x, we attain (Pollard, 1984, p. 54)
E(ϑ(Yn + σX))
Z Z xT x
2 −k/2
= (2πσ ) ϑ(y + x) exp − 2
dx dFn (y)
R
Z Zk R k 2σ
2 −k/2
1 T

= (2πσ ) ϑ(z) exp − 2 (z − y) (z − y) dFn (y) dz,
Rk R k 2σ
where Fn denotes the distribution function of Yn . Since the charac-
teristic function of σX is bounded and is given by
φσX (t) = E(exp(itT σX))
Z xT x
T 2 −k/2
= exp(it x)(2πσ ) exp − dx
Rk 2σ 2
1
= exp(− tT tσ 2 ),
2
due to the Gaussian integral, we finally obtain the desired assertion

from the dominated convergence theorem and Fubini’s Theorem,
E(ϑ(Yn + σX))
Z Z σ 2 k/2
2 −k/2
= (2πσ ) ϑ(z) ·
Rk Rk 2π
σ2 T
Z
T
exp iv (z − y) − v v dv dFn (y) dz
R k 2
σ2 T
Z Z
−k T
= (2π) ϑ(z) φYn (−v) exp iv z − v v dv dz
Rk Rk 2
σ2 T
Z Z
−k T
→ (2π) ϑ(z) φY (−v) exp iv z − v v dv dz
R k R k 2
= E(ϑ(Y + σX))
as n → ∞.
Theorem 7.2.4. (Cramer–Wold Device) Let (Yn )n∈N , be a sequence
of real valued random k-vectors and let Y be a real valued random k-
D
vector, all defined on some probability space (Ω, A, P). Then, Yn → Y
D
as n → ∞ iff λT Yn → λT Y as n → ∞ for every λ ∈ Rk .
D
Proof. Suppose that Yn → Y as n → ∞. Hence, for arbitrary λ ∈
Rk , t ∈ R,
φλT Yn (t) = E(exp(itλT Yn )) = φYn (tλ) → φY (tλ) = φλT Y (t)
D
as n → ∞, showing λT Yn → λT Y as n → ∞.
D
If conversely λT Yn → λT Y as n → ∞ for each λ ∈ Rk , then we have
φYn (λ) = E(exp(iλT Yn )) = φλT Yn (1) → φλT Y (1) = φY (λ)
D
as n → ∞, which provides Yn → Y as n → ∞.
Definition 7.2.5. A sequence (Yn )n∈N of real valued random k-vectors,
all defined on some probability space (Ω, A, P), is said to be asymptot-
ically normal with asymptotic mean vector µn and asymptotic covari-
ance matrix Σn , written as Yn ≈ N (µn , Σn ), if for all n sufficiently
large λT Yn ≈ N (λT µn , λT Σn λ) for every λ ∈ Rk , λ 6= 0, and if Σn is
symmetric and positive definite for all these sufficiently large n.
Lemma 7.2.6. Consider a sequence (Yn )n∈N of real valued random

k-vectors Yn := (Yn1 , . . . , Ynk )T , all defined on some probability space
D
(Ω, A, P), with corresponding distribution functions Fn . If Yn → c
as n → ∞, where c := (c1 , . . . , ck )T is a constant vector in Rk , then
P
Yn → c as n → ∞.
D
Proof. In the one-dimensional case k = 1, we have Yn → c as n → ∞,
or, respectively, Fn (t) → 1[c,∞) (t) as n → ∞ for all t 6= c. Thereby,
for any ε > 0,
lim P (|Yn − c| ≤ ε) = lim P (c − ε ≤ Yn ≤ c + ε)
n→∞ n→∞
= 1[c,∞) (c + ε) − 1[c,∞) (c − ε) = 1,
P
showing Yn → c as n → ∞. For the multidimensional case k > 1,
D
we obtain Yni → ci as n → ∞ for each i = 1, . . . k by the Cramer
Wold device. Thus, we have k one-dimensional cases, which leads
P P
to Yni → ci as n → ∞ for each i = 1, . . . k, providing Yn → c as
n → ∞.
In order to derive weak convergence for a sequence (Xn )n∈N of real
valued random vectors, it is often easier to show the weak convergence
for an approximative sequence (Yn )n∈N . Lemma 7.2.8 will show that
P
both sequences share the same limit distribution if Yn − Xn → 0
for n → ∞. The following Basic Approximation Theorem uses a
similar strategy, where the approximative sequence is approximated
by a further subsequence.
Theorem 7.2.7. Let (Xn )n∈N , (Ym )m∈N and (Ymn )n∈N for each m ∈
N be sequences of real valued random vectors, all defined on the same
probability space (Ω, A, P), such that
D
(i) Ymn → Ym as n → ∞ for each m,
D
(ii) Ym → Y as m → ∞ and
(iii) limm→∞ lim supn→∞ P {|Xn − Ymn | > ε} = 0 for every ε > 0.
D
Then, Xn → Y as n → ∞.
Proof. By the Continuity Theorem 7.2.3, we need to show that
|φXn (t) − φY (t)| → 0 as n → ∞
for each t ∈ Rk . The triangle inequality gives
|φXn (t) − φY (t)| ≤ |φXn (t) − φYmn (t)| + |φYmn (t) − φYm (t)|
+ |φYm (t) − φY (t)|, (7.12)
where the first term on the right-hand side satisfies, for δ > 0,
|φXn (t) − φYmn (t)|

= | E(exp(itT Xn ) − exp(itT Ymn ))|
≤ E(| exp(itT Xn )(1 − exp(itT (Ymn − Xn )))|)
= E(|1 − exp(itT (Ymn − Xn ))|)
= E(|1 − exp(itT (Ymn − Xn ))|1(−δ,δ) (|Ymn − Xn |))
+ E(|1 − exp(itT (Ymn − Xn ))|1(−∞,−δ]∪[δ,∞) (|Ymn − Xn |)). (7.13)
Now, given t ∈ Rk and ε > 0, we choose δ > 0 such that
| exp(itT x) − exp(itT y)| < ε if |x − y| < δ,
which implies
E(|1 − exp(itT (Ymn − Xn ))|1(−δ,δ) (|Ymn − Xn |)) < ε.
Moreover, we find
|1 − exp(itT (Ymn − Xn ))| ≤ 2.
This shows the upper boundedness
E(|1 − exp(itT (Ymn − Xn ))|1(−∞,−δ]∪[δ,∞) (|Ymn − Xn |))

≤ 2P {|Ymn − Xn | ≥ δ}.
Hence, by property (iii), we have
lim sup |φXn (t) − φYmn (t)| → 0 as m → ∞.

n→∞
Assumption (ii) guarantees that the last term in (7.12) vanishes as

m → ∞. For any η > 0, we can therefore choose m such that the
upper limits of the first and last term on the right-hand side of (7.12)
are both less than η/2 as n → ∞. For this fixed m, assumption (i)
provides limn→∞ |φYmn (t) − φYm (t)| = 0. Altogether,
lim sup |φXn (t) − φY (t)| < η/2 + η/2 = η,

n→∞
which completes the proof, since η was chosen arbitrarily.

Lemma 7.2.8. Consider sequences of real valued random k-vectors
(Yn )n∈N and (Xn )n∈N , all defined on some probability space (Ω, A, P),
P
such that Yn − Xn → 0 as n → ∞. If there exists a random k-vector
D D
Y such that Yn → Y as n → ∞, then Xn → Y as n → ∞.
Proof. Similar to the proof of Theorem 7.2.7, we find, for arbitrary
t ∈ Rk and ε > 0, a δ > 0 satisfying
|φXn (t) − φYn (t)| ≤ E(|1 − exp(itT (Yn − Xn ))|)

< ε + 2P {|Yn − Xn | ≥ δ}.
Consequently, |φXn (t) − φYn (t)| → 0 as n → ∞. The triangle inequal-

ity then completes the proof
|φXn (t) − φY (t)| ≤ |φXn (t) − φYn (t)| + |φYn (t) − φY (t)|
→ 0 as n → ∞.
Lemma 7.2.9. Let (Yn )n∈N be a sequence of real valued random k-

D
vectors, defined on some probability space (Ω, A, P). If Yn → Y as
D
n → ∞, then ϑ(Yn ) → ϑ(Y ) as n → ∞, where ϑ : Rk → Rm is a
continuous function.
Proof. Since, for fixed t ∈ Rm , φϑ(y) (t) = E(exp(itT ϑ(y))), y ∈ Rk ,
is a bounded and continuous function of y, Theorem 7.2.2 provides
that φϑ(Yn ) (t) = E(exp(itT ϑ(Yn ))) → E(exp(itT ϑ(Y ))) = φϑ(Y ) (t) as
n → ∞ for each t. Finally, the Continuity Theorem 7.2.3 completes
the proof.
Lemma 7.2.10. Consider a sequence of real valued random k-vectors

(Yn )n∈N and a sequence of real valued random m-vectors (Xn )n∈N ,
where all random variables are defined on the same probability space
D P
(Ω, A, P). If Yn → Y and Xn → λ as n → ∞, where λ is a constant
D
m-vector, then (YnT , XnT )T → (Y T , λT )T as n → ∞.
D
Proof. Defining Wn := (YnT , λT )T , we get Wn → (Y T , λT )T as well
P
as Wn − (YnT , XnT )T → 0 as n → ∞. The assertion follows now from
Lemma 7.2.8.
Lemma 7.2.11. Let (Yn )n∈N and (Xn )n∈N be sequences of random k-
vectors, where all random variables are defined on the same probability
D D
space (Ω, A, P). If Yn → Y and Xn → λ as n → ∞, where λ is a
D
constant k-vector, then XnT Yn → λT Y as n → ∞.
Proof. The assertion follows directly from Lemma 7.2.6, 7.2.10 and
7.2.9 by applying the continuous function ϑ : R2k → R, ϑ((xT , y T )T ) :=
xT y, where x, y ∈ Rk .
Strictly Stationary M-Dependent Sequences

P∞
Lemma 7.2.12. Consider the process Yt := u=−∞ bu Zt−u , t ∈ Z,
where (bu )u∈Z is an absolutely summable filter of real valued numbers
and where (Zt )t∈Z is a process of square integrable, independent and
identically distributed, real valued random variables with expectation
P
E(Zt ) = µ. Then Ȳn := n1 nt=1 Yt → µ ∞
P P
u=−∞ bu as n ∈ N, n → ∞.
Proof. We approximate Ȳn by

n
1X X
Ynm := bu Zt−u .
n t=1
|u|≤m
Due to the weak law of large numbers, which provides the conver-
1
Pn P P
gence
P in probability n t=1 Zt−u → µ as n → ∞,P we find Ynm →
µ |u|≤m bu as n → ∞. Defining now Ym := µ |u|≤m bu , entailing
Ym → µ ∞
P
u=−∞ bu as m → ∞, it remains to show that
lim lim sup P {|Ȳn − Ynm | > ε} = 0

m→∞ n→∞
for every ε > 0, since then the assertion

P∞follows at once from Theorem
7.2.7 and 7.2.6. Note that Ym and µ u=−∞ bu are constant numbers.
By Markov’s inequality, we attain the required condition
n 1 Xn X o
P {|Ȳn − Ynm | > ε} = P bu Zt−u > ε

n t=1
|u|>m
1 X
≤ |bu | E(|Z1 |) → 0
ε
|u|>m
as m → ∞, since |bu | → 0 as u → ∞, due to the absolutely summa-

bility of (bu )u∈Z .
Lemma 7.2.13. Consider the process Yt = ∞
P
u=−∞ bu Zt−u of the pre-
vious Lemma 7.2.12 with E(Zt ) = µ = 0 and variance E(Zt2 ) =
P
σ 2 > 0, t ∈ Z. Then, γ̃n (k) := n1 nt=1 Yt Yt+k → γ(k) for k ∈ N
P
as n ∈ N, n → ∞, where γ(k) denotes the autocovariance function of
Yt .
Proof. Simple algebra gives
n n ∞ ∞
1X 1X X X
γ̃n (k) = Yt Yt+k = bu bw Zt−u Zt−w+k
n t=1 n t=1 u=−∞ w=−∞
n ∞ n ∞
1X X 2 1X X X
= bu bu+k Zt−u + bu bw Zt−u Zt−w+k .
n t=1 u=−∞ n t=1 u=−∞
w6=u+k
2
P∞
The first term converges
P∞ in probability to
P∞σ ( Pb∞
u=−∞ u bu+k ) = γ(k)
by Lemma 7.2.12 and u=−∞ |bu bu+k | ≤ u=−∞ |bu | w=−∞ |bw+k | <
∞. It remains to show that
n ∞
1X X X P
Wn := bu bw Zt−u Zt−w+k → 0
n t=1 u=−∞
w6=u−k
as n → ∞. We approximate Wn by
n
1X X X
Wnm := bu bw Zt−u Zt−w+k .
n t=1
|u|≤m |w|≤m,w6=u+k
and deduce from E(Wnm ) = 0, Var(n−1 nt=1 Zt−u Zt−w+k ) = n−1 σ 4 , if

P
w 6= u + k, and Chebychev’s inequality,
P {|Wnm | ≥ ε} ≤ ε−2 Var(Wnm )

P
for every ε > 0, that Wnm → 0 as n → ∞. By the Basic Approxima-
tion Theorem 7.2.7, it remains to show that
lim lim sup P {|Wn − Wmn | > ε} = 0

m→∞ n→∞
P
for every ε > 0 in order to establish Wn → 0 as n → ∞. Applying
Markov’s inequality, we attain
P {|Wn − Wnm | > ε} ≤ ε−1 E(|Wn − Wnm |)

n X
X X
−1
≤ (εn) |bu bw | E(|Zt−u Zt−w+k |).
t=1 |u|>m |w|>m,w6=u−k
This shows
lim lim sup P {|Wn − Wnm | > ε}

m→∞ n→∞
X X
≤ lim ε−1 |bu bw | E(|Z1 Z2 |) = 0,
m→∞
|u|>m |w|>m,w6=u−k
since bu → 0 as u → ∞.
Definition 7.2.14. A sequence (Yt )t∈Z of square integrable, real val-
ued random variables is said to be strictly stationary if (Y1 , . . . , Yk )T
and (Y1+h , . . . , Yk+h )T have the same joint distribution for all k > 0
and h ∈ Z.
Observe that strict stationarity implies (weak) stationarity.
Definition 7.2.15. A strictly stationary sequence (Yt )t∈Z of square
integrable, real valued random variables is said to be m-dependent,
m ≥ 0, if the two sets {Yj |j ≤ k} and {Yi |i ≥ k + m + 1} are
independent for each k ∈ Z.
In the special case of m = 0, m-dependence reduces to independence.
Considering especially a MA(q)-process, we find m = q.
Theorem 7.2.16. (The Central Limit Theorem For Strictly Station-

ary M-Dependent Sequences) Let (Yt )t∈Z be a strictly stationary se-
quence of square integrable m-dependent real valued random variables
with expectation zero, E(Yt ) = 0. ItsPautocovariance function is de-
noted by γ(k). Now, if Vm := γ(0)+2 m k=1 γ(k) 6= 0, then, for n ∈ N,
√
(i) limn→∞ Var( nȲn ) = Vm and
√ D
(ii) nȲn → N (0, Vm ) as n → ∞,
which implies that Ȳn ≈ N (0, Vm /n) for n sufficiently large,
where Ȳn := n1 ni=1 Yi .
P
Proof. We have
n n n n
√ 1 X 1 X 1 XX
Var( nȲn ) = n E Yi Yj = γ(i − j)
n i=1
n j=1
n i=1 j=1
1
= 2γ(n − 1) + 4γ(n − 2) + · · · + 2(n − 1)γ(1) + nγ(0)
n
n−1 1
= γ(0) + 2 γ(1) + · · · + 2 γ(n − 1)
n n
X |k|
= 1− γ(k).
n
|k|<n,k∈Z
The autocovariance function γ(k) vanishes for k > m, due to the

m-dependence of (Yt )t∈Z . Thus, we get assertion (i),
√ X |k|
lim Var( nȲn ) = lim 1− γ(k)
n→∞ n→∞ n
|k|<n,k∈Z
X
= γ(k) = Vm .
|k|≤m,k∈Z
(p)
To prove (ii) we define random variables Ỹi := (Yi+1 +· · ·+Yi+p ), i ∈
N0 := N ∪ {0}, as the sum of p, p > m, sequent variables taken from
(p)
the sequence (Yt )t∈Z . Each Ỹi has expectation zero and variance
p X
p
(p)
X
Var(Ỹi ) = γ(l − j)
l=1 j=1
= pγ(0) + 2(p − 1)γ(1) + · · · + 2(p − m)γ(m).
(p) (p) (p)

Note that Ỹ0 , Ỹp+m , Ỹ2(p+m) , . . . are independent of each other.
(p) (p) (p)
Defining Yrp := Ỹ0 + Ỹp+m + · · · + Ỹ(r−1)(p+m) , where r = [n/(p + m)]
denotes the greatest integer less than or equal to n/(p + m), we attain
a sum of r independent, identically distributed and square integrable
random variables. The central limit theorem now provides, as r → ∞,
(p) D
(r Var(Ỹ0 ))−1/2 Yrp → N (0, 1),
which implies the weak convergence
1 D
√ Yrp → Yp as r → ∞,
n
(p)
where Yp follows the normal distribution N (0, Var(Ỹ0 )/(p + m)).
(p)
Note that r → ∞ and n → ∞ are nested. Since Var(Ỹ0 )/(p + m) →
Vm as p → ∞ by the dominated convergence theorem, we attain
moreover
D
Yp → Y as p → ∞,
where Y is N (0, Vm )-distributed. In order to apply Theorem 7.2.7, we
have to show that the condition
√

1
lim lim sup P nȲn − √ Yrp > ε = 0
p→∞ n→∞ n
√
holds for every ε > 0. The term nȲn − √1n Yrp is a sum of r indepen-
dent random variables
r−1
√ 1 1 X
nȲn − √ Yrp = √ Yi(p+m)−m+1 + Yi(p+m)−m+2 + . . .
n n i=1
!

· · · + Yi(p+m) + (Yr(p+m)−m+1 + · · · + Yn )
√
with variance Var( nȲn − √1n Yrp ) = n1 ((r − 1) Var(Y1 + · · · + Ym ) +
Var(Y1 + · · · + Ym+n−r(p+m) )). From Chebychev’s inequality, we know
√ √

1 1
P nȲn − √ Yrp ≥ ε ≤ ε−2 Var

nȲn − √ Yrp .
n n
Since the term m+n−r(p+m) is bounded by m ≤ m+n−r(p+m) √ <

1
2m + p independent of n, we get lim supn→∞ Var( nȲn − √n Yrp ) =
1
p+m Var(Y1 + · · · + Ym ) and can finally deduce the remaining condition
√

1
lim lim sup P nȲn − √ Yrp ≥ ε
p→∞ n→∞ n
1
≤ lim Var(Y1 + · · · + Ym ) = 0.
p→∞ (p + m)ε2
√ D
Hence, nȲn → N (0, Vm ) as n → ∞.
7.2.17. Let (Yt )t∈Z be the MA(q)-process Yt = qu=0 bu εt−u
P
Example P
q
satisfying u=0 bu 6= 0 and b0 = 1, where the εt are independent,
identically distributed and square integrable random variables with
E(εt ) = 0 and Var(εt ) = σ 2 > 0. Since the process is a q-dependent
strictly stationary sequence with
q q q
!
X X X
2
γ(k) = E bu εt−u bv εt+k−v =σ bu bu+k
u=0 v=0 u=0
for |k| ≤ q, where bw = 0 for w > q, we find Vq := γ(0)+2 qj=1 γ(j) =

P
σ 2 ( qj=0 bj )2 > 0.
P
2 P
Theorem 7.2.16 (ii) then implies that Ȳn ≈ N (0, σn ( qj=0 bj )2 ) for
sufficiently large n.
P∞
Theorem 7.2.18. Let Yt = u=−∞ bu Zt−u , t ∈ Z, be a station-
ary process with absolutely summable real valued filter (bu )u∈Z , where
(Zt )t∈Z is a process of independent, identically distributed, square inte-
grable and P real valued random variables with E(Zt ) = 0 and Var(Zt ) =
σ > 0. If ∞
2
u=−∞ bu 6= 0, then, for n ∈ N,
∞
√ D

2
X 2
nȲn → N 0, σ bu as n → ∞, as well as
u=−∞
∞
σ2 X 2
Ȳn ≈ N 0, bu for n sufficiently large,
n u=−∞
1
Pn
where Ȳn := n i=1 Yi .
(m) (m)
Proof. We approximate Yt by Yt := m
P
u=−m bu Zt−u and let Ȳn :=
1
Pn (m)
n t=1 Yt . With Theorem 7.2.16 and Example 7.2.17 above, we
attain, as n → ∞,
√ (m) D (m)
nȲn → Y ,
where Y (m) is N (0, σ 2 ( m 2

P
u=−m bu ) )-distributed.
P∞ Furthermore, since
the filter (bu )u∈Z is absolutely summable u=−∞ |bu | < ∞, we con-
clude by the dominated convergence theorem that, as m → ∞,
∞
X 2
(m) D 2
Y → Y, where Y is N 0, σ bu -distributed.
u=−∞
D
In order to show Ȳn → Y as n → ∞ by Theorem 7.2.7 and Cheby-
chev’s inequality, we have to proof
√
lim lim sup Var( n(Ȳn − Ȳn(m) )) = 0.
m→∞ n→∞
The dominated convergence theorem gives

√
Var( n(Y¯n − Ȳn(m) ))
1 X n X
= n Var bu Zt−u
n t=1
|u|>m
X 2
→ σ2 bu as n → ∞.
|u|>m
Hence, for every ε > 0, we get

√
lim lim sup P {| n(Ȳn − Ȳn(m) )| ≥ ε}
m→∞ n→∞
1 √
≤ lim lim sup 2
Var( n(Ȳn − Ȳn(m) )) = 0,
m→∞ n→∞ ε
since bu → 0 as u → ∞, showing that
∞
√ D

2
X 2
nȲn → N 0, σ bu
u=−∞
as n → ∞.
Asymptotic Normality of Partial Autocorrelation Es-

timator
Recall the empirical autocovariance function c(k) = n1 n−k
P
t=1 (yt+k −
ȳ)(yt − ȳ) for a given realization y1 , . . . , yn of a process (Yt )t∈Z , where
ȳn = n1 ni=1 yi . Hence, ĉn (k) := n1 n−k
P P
Pn t=1 (Yt+k − Ȳ )(Yt − Ȳ ), where
1
Ȳn := n i=1 Yi , is an obvious estimator of the autocovariance P∞ func-
tion at lag k. Consider now a stationary process Yt = u=−∞ bu Zt−u ,
t ∈ Z, with absolutely summable real valued filter (bu )u∈Z , where
(Zt )t∈Z is a process of independent, identically distributed and square
integrable, real valued random variables with E(Zt ) = 0 and Var(Zt ) =
D
σ 2 > 0. Then, we know from Theorem 7.2.18 that n1/4 Ȳn → 0 as
n → ∞, due to the vanishing variance. Lemma 7.2.6 implies that
P
n1/4 Ȳn → 0 as n → ∞ and, consequently, we obtain together with
Lemma 7.2.13
P
ĉn (k) → γ(k) (7.14)
as n → ∞. Thus, the Yule-Walker equations (7.3) of an AR(p)-
process satisfying above process conditions motivate the following
Yule-Walker estimator
   
âk1,n ĉn (1)
−1  .. 
âk,n :=  ...  = Ĉk,n . (7.15)
âkk,n ĉn (k)
for k ∈ N, where Ĉk,n := (ĉn (|i − j|))1≤i,j≤k denotes an estima-
tion of the autocovariance matrix Σk of k sequent process variables.
P P
ĉn (k) → γ(k) as n → ∞ implies the convergence Ĉk,n → Σk as
n → ∞. Since Σk is regular by Theorem 7.1.1, above estimator
(7.15) is well defined for n sufficiently large. The k-th component of
the Yule-Walker estimator α̂n (k) := âkk,n is the partial autocorrelation
estimator of the partial autocorrelation coefficient α(k). Our aim is
to derive an asymptotic distribution of this partial autocorrelation es-
timator or, respectively, of the corresponding Yule-Walker estimator.
Since the direct derivation from (7.15) is very tedious, we will choose
an approximative estimator.
Consider the AR(p)-process Yt = a1 Yt−1 + · · · + ap Yt−p + εt satisfying
the stationarity condition, where the errors εt are independent and
identically distributed with expectation E(εt ) = 0. Since the process,

written in matrix notation
Yn = Xn ap + En (7.16)
with Yn := (Y1 , Y2 , . . . , Yn )T , En := (ε1 , ε2 , . . . , εn )T ∈ Rn , ap :=

(a1 , a2 , . . . , ap )T ∈ Rp and n × p-design matrix
 
Y0 Y−1 Y−2 . . . Y1−p
 Y1 Y0 Y−1 . . . Y2−p 
 
Xn =  .Y2 Y 1 Y 0 . . . Y 3−p ,
 .. .. .. .. 
. . . 
Yn−1 Yn−2 Yn−3 . . . Yn−p
has the pattern of a regression model, we define
a∗p,n := (XnT Xn )−1 XnT Yn (7.17)
as a possibly appropriate approximation of âp,n . Since, by Lemma

7.2.13,
1 P
(XnT Xn ) → Σp as well as
n
1 T P
Xn Yn → (γ(1), . . . , γ(p))T
n
as n → ∞, where the covariance matrix Σp of p sequent process
variables is regular by Theorem 7.1.1, above estimator (7.17) is well
defined for n sufficiently large. Note that the first convergence in
probability is again a conveniently written form of the entrywise con-
vergence
n−i
1 X P
Yk Yk+i−j → γ(i − j)
n
k=1−i
as n → ∞ of all the (i, j)-elements of n1 (XnT Xn ), i = 1, . . . , n and

j = 1, . . . , p. The next two theorems will now show that there exists
a relationship in the asymptotic behavior of a∗p,n and âp,n .
Theorem 7.2.19. Suppose (Yt )t∈Z is an AR(p)-process Yt = a1 Yt−1 +

· · · + ap Yt−p + εt , t ∈ Z, satisfying the stationarity condition, where the
errors εt are independent and identically distributed with expectation
E(εt ) = 0 and variance Var(εt ) = σ 2 > 0. Then,
√ D
n(a∗p,n − ap ) → N (0, σ 2 Σ−1
p )
as n ∈ N, n → ∞, with a∗p,n := (XnT Xn )−1 XnT Yn and the vector of

coefficients ap = (a1 , a2 , . . . , ap )T ∈ Rp .
Accordingly, a∗p,n is asymptotically normal distributed with asymptotic
expectation E(a∗p,n ) = ap and asymptotic covariance matrix n1 σ 2 Σ−1
p
for sufficiently large n.
Proof. By (7.16) and (7.17), we have

√ ∗
√ T −1 T

n(ap,n − ap ) = n (Xn Xn ) Xn (Xn ap + En ) − ap
= n(XnT Xn )−1 (n−1/2 XnT En ).
P
Lemma 7.2.13 already implies n1 XnT Xn → Σp as n → ∞. As Σp is a
matrix of constant elements, it remains to show that
D
n−1/2 XnT En → N (0, σ 2 Σp )
as n → ∞, since then, the assertion follows directly from Lemma

7.2.11. Defining Wt := (Yt−1 εt , . . . , Yt−p εt )T , we can rewrite the term
as
n
X
−1/2 −1/2
n XnT En =n Wt .
t=1
The considered process (Yt )t∈Z satisfies the stationarity condition. So,
Theorem
P 2.2.3 provides the almost surely stationary solution Yt =
u≥0 bu εt−u , t ∈ Z. Consequently, we are able to approximate Yt
(m) (m)
:= m
P
by Yt u=0 bu εt−u and furthermore Wt by the term Wt :=
(m) (m) T T
(Yt−1 εt , . . . , Yt−p εt ) . Taking an arbitrary vector λ := (λ1 , . . . , λp ) ∈
Rp , λ 6= 0, we gain a strictly stationary (m + p)-dependent sequence
(m)
(Rt )t∈Z defined by
(m) (m) (m) (m)
Rt := λT Wt = λ1 Yt−1 εt + · · · + λp Yt−p εt
Xm X m
= λ1 bu εt−u−1 εt + λ2 bu εt−u−2 εt + . . .
u=0 u=0
X m
+ λp bu εt−u−p εt
u=0
(m)
with expectation E(Rt ) = 0 and variance
(m) (m)
Var(Rt ) = λT Var(Wt )λ = σ 2 λT Σ(m)
p λ > 0,
(m) (m) (m)

where Σp is the regular covariance matrix of (Yt−1 , . . . , Yt−p )T , due
to Theorem 7.1.1 and b0 = 1. In order to apply Theorem 7.2.16 to the
(m)
sequence (Rt )t∈Z , we have to check if the autocovariance function
(m)
of Rt satisfies the required condition Vm := γ(0) + 2 m
P
k=1 γ(k) 6= 0.
In view of the independence of the errors εt , we obtain, for k 6= 0,
(m) (m) (m)
γ(k) = E(Rt Rt+k ) = 0. Consequently, Vm = γ(0) = Var(Rt ) > 0
and it follows
n
(m) D
X
−1/2
n λT Wt → N (0, σ 2 λT Σ(m)
p λ)
t=1
as n → ∞. Thus, applying the Cramer–Wold device 7.2.4 leads to

n
(m) D
X
−1/2
n λT Wt → λT U (m)
t=1
(m)
as n → ∞, where U (m) is N (0, σ 2 Σp )-distributed. Since, entrywise,
(m) D
σ 2 Σp → σ 2 Σp as m → ∞, we attain λT U (m) → λT U as m → ∞
by the dominated convergence theorem, where U follows the normal
distribution N (0, σ 2 Σp ). With Theorem 7.2.7, it only remains to show
that, for every ε > 0,
n 1 X n n
T 1 X T (m) o
lim lim sup P √ λ Wt − √ λ Wt > ε = 0

m→∞ n→∞ n t=1 n t=1
to establish
n
1 X T D
√ λ Wt → λT U
n t=1
as n → ∞, which, by the Cramer–Wold device 7.2.4, finally leads to

the desired result
1 D
√ XnT En → N (0, σ 2 Σp )
n
as n → ∞. Because of the identically distributed and independent
(m)
Wt − Wt , 1 ≤ t ≤ n, we find the variance
1 X n n
T 1 X T (m)
Var √ λ Wt − √ λ Wt
n t=1 n t=1
n
1 X
T (m)

(m)
= Var λ (Wt − Wt ) = λT Var(Wt − Wt )λ
n t=1
(m)
being independent of n. Since almost surely Wt → Wt as m → ∞,
Chebychev’s inequality finally gives
n 1 n o
(m)
X
T
lim lim sup P √ λ (Wt − Wt ) ≥ ε

m→∞ n→∞ n t=1
1 (m)
≤ lim 2 λT Var(Wt − Wt )λ = 0.
m→∞ ε
Theorem 7.2.20. Consider the AR(p)-process from Theorem 7.2.19.

Then,
√ D
n(âp,n − ap ) → N (0, σ 2 Σ−1
p )
as n ∈ N, n → ∞, where âp,n is the Yule-Walker estimator and ap =

(a1 , a2 , . . . , ap )T ∈ Rp is the vector of coefficients.
Proof. In view of the previous Theorem 7.2.19, it remains to show

that the Yule-Walker estimator âp,n and a∗p,n follow the same limit
law. Therefore, together with Lemma 7.2.8, it suffices to prove that
√ P
n(âp,n − a∗p,n ) → 0
as n → ∞. Applying the definitions, we get

√ √
n(âp,n − a∗p,n ) = n(Ĉp,n−1
ĉp,n − (XnT Xn )−1 XnT Yn )
√ −1 √ −1
= nĈp,n (ĉp,n − n1 XnT Yn ) + n(Ĉp,n − n(XnT Xn )−1 ) n1 XnT Yn ,
(7.18)
√
where ĉp,n := (ĉn (1), . . . , ĉn (p))T . The k-th component of n(ĉp,n −
1 T
n Xn Yn ) is given by
n−k n−k
1 X X
√ (Yt+k − Ȳn )(Yt − Ȳn ) − Ys Ys+k
n t=1
s=1−k
0 n−k
1 X 1 X n−k

= −√ Yt Yt+k − √ Ȳn Ys+k + Ys + √ Ȳn2 , (7.19)
n n s=1
n
t=1−k
where the latter terms can be written as

n−k n−k
1 X
− √ Ȳn Ys+k + Ys + √ Ȳn2
n s=1
n
n k
√ n−k 2 1 X X
= −2 nȲn2 + √ Ȳn + √ Ȳn Yt + Ys
n n s=1
t=n−k+1
n k
1/4 1/4 k 1 X X
= −n Ȳn n Ȳn − √ Ȳn2 + √ Ȳn Yt + Ys . (7.20)
n n s=1
t=n−k+1
Because of the vanishing variance as n → ∞, Theorem 7.2.18 gives

D
n1/4 Ȳn → 0 as n → ∞, which leads together with Lemma 7.2.6 to
P
n1/4 Ȳn → 0 as n → ∞, showing that (7.20) converges in probability
to zero as n → ∞. Consequently, (7.19) converges in probability to
zero.
Focusing now on the second term in (7.18), Lemma 7.2.13 implies that
1 T P
Xn Yn → γ(p)
n
as n → ∞. Hence, we need to show the convergence in probability
√ −1 P
n ||Ĉp,n − n(XnT Xn )−1 || → 0 as n → ∞,
−1
where ||Ĉp,n − n(XnT Xn )−1 || denotes the Euclidean norm of the p2
−1
dimensional vector consisting of all entries of the p × p-matrix Ĉp,n −
n(XnT Xn )−1 . Thus, we attain
√ −1
n ||Ĉp,n − n(XnT Xn )−1 || (7.21)
√ −1 −1
= n ||Ĉp,n (n XnT Xn − Ĉp,n )n(XnT Xn )−1 ||
√ −1
≤ n ||Ĉp,n || ||n−1 XnT Xn − Ĉp,n || ||n(XnT Xn )−1 ||, (7.22)
where
√
n ||n−1 XnT Xn − Ĉp,n ||2
p X
X p X n
−1/2 −1/2
=n n Ys−i Ys−j
i=1 j=1 s=1
n−|i−j|
X 2
−1/2
−n (Yt+|i−j| − Ȳn )(Yt − Ȳn ) .
t=1
Regarding (7.19) with k = |i − j|, it only remains to show that

n n−k
P
X X
−1/2 −1/2
n Ys−i Ys−j − n Yt Yt+k → 0 as n → ∞
s=1 t=1−k
or, equivalently,
n−i n−k
P
X X
−1/2 −1/2
n Ys Ys−j+i − n Yt Yt+k → 0 as n → ∞.
s=1−i t=1−k
In the case of k = i − j, we find

n−i
X n−i+j
X
−1/2 −1/2
n Ys Ys−j+i − n Yt Yt+i−j
s=1−i t=1−i+j
−i+j
X n−i+j
X
= n−1/2 Ys Ys−j+i − n−1/2 Yt Yt−j+i .
s=1−i t=n−i+1
Applied to Markov’s inequality yields

n −i+j n−i+j o
−1/2 X −1/2
X
P n Ys Ys−j+i − n Yt Yt−j+i ≥ ε

s=1−i t=n−i+1
n −i+j
X o
−1/2
≤ P n Ys Ys−j+i ≥ ε/2

s=1−i
n n−i+j
X o
−1/2
+ P n Yt Yt−j+i ≥ ε/2

t=n−i+1
√ P
≤ 4( nε)−1 jγ(0) → 0 as n → ∞.
In the case of k = j − i, on the other hand,

n−i
X n−j+i
X
−1/2 −1/2
n Ys Ys−j+i − n Yt Yt+j−i
s=1−i t=1−j+i
n−i
X n
X
−1/2 −1/2
=n Ys Ys−j+i − n Yt+i−j Yt
s=1−i t=1
entails that
n 0 n o
−1/2 X P
X
−1/2
P n Ys Ys−j+i − n Yt+i−j Yt ≥ ε → 0 as n → ∞.

s=1−i t=n−i+1
−1 P P
This completes the proof, since Ĉp,n → Σ−1 T
p as well as n(Xn Xn )
−1
→
−1
Σp as n → ∞ by Lemma 7.2.13.
Theorem 7.2.21. : Let (Yt )t∈Z be an AR(p)-process Yt = a1 Yt−1 +

· · · + ap Yt−p + εt , t ∈ Z, satisfying the stationarity condition, where
(εt )t∈Z is a white noise process of independent and identically dis-
tributed random variables with expectation E(εt ) = 0 and variance
E(ε2t ) = σ 2 > 0. Then the partial autocorrelation estimator α̂n (k) of
order k > p based on Y1 , . . . , Yn , is asymptotically normal distributed
with expectation E(α̂n (k)) = 0 and variance E(α̂n (k)2 ) = 1/n for suf-
ficiently large n ∈ N.
Proof. The AR(p)-process can be regarded as an AR(k)-process Ỹt :=

a1 Yt−1 + · · · + ak Yt−k + εt with k > p and ai = 0 for p < i ≤ k. The
partial autocorrelation estimator α̂n (k) is the k-th component âkk,n of
the Yule-Walker estimator âk,n . Defining
(σij )1≤i,j≤k := Σ−1

k
as the inverse matrix of the covariance matrix Σk of k sequent process

variables, Theorem 7.2.20 provides the asymptotic behavior
σ2
α̂n (k) ≈ N ak , σkk
n
for n sufficiently large. We obtain from Lemma 7.1.5 with y := Yk
and x := (Y1 , . . . , Yk−1 ) that
−1 1
σkk = Vyy,x = ,
since y and so Vyy,x have dimension one. As (Yt )t∈Z is an AR(p)-

process satisfying the Yule-Walker
Pp equations, the best linear approxi-
mation is given by Ŷk,k−1 = u=1 au Yk−u and, therefore, Yk − Ŷk,k−1 =
εk . Thus, for k > p
1
α̂n (k) ≈ N 0, .
n
7.3 Asymptotic Normality of Autocorrelation Estimator 259
7.3 Asymptotic Normality of Autocorrela-

tion Estimator
The Landau-Symbols
A sequence (Yn )n∈N of real valued random variables, defined on some
probability space (Ω, A, P), is said to be bounded in probability (or
tight), if, for every ε > 0, there exists δ > 0 such that
P {|Yn | > δ} < ε
for all n, conveniently notated as Yn = OP (1). Given a sequence

(hn )n∈N of positive real valued numbers we define moreover
Yn = OP (hn )
if and only if Yn /hn is bounded in probability and
Yn = oP (hn )
if and only if Yn /hn converges in probability to zero. A sequence

(Yn )n∈N , Yn := (Yn1 , . . . , Ynk ), of real valued random vectors, all de-
fined on some probability space (Ω, A, P), is said to be bounded in
probability
Yn = O P (hn )
if and only if Yni /hn is bounded in probability for every i = 1, . . . , k.

Furthermore,
Yn = oP (hn )
if and only if Yni /hn converges in probability to zero for every i =

1, . . . , k.
The next four Lemmas will show basic properties concerning these
Landau-symbols.
Lemma 7.3.1. Consider two sequences (Yn )n∈N and (Xn )n∈N of real
valued random k-vectors, all defined on the same probability space
(Ω, A, P), such that Yn = O P (1) and Xn = oP (1). Then, Yn Xn =
oP (1).
Proof. Let Yn := (Yn1 , . . . , Ynk )T and Xn := (Xn1 , . . . , Xnk )T . For any

fixed i ∈ {1, . . . , k}, arbitrary ε > 0 and δ > 0, we have
P {|Yni Xni | > ε} = P {|Yni Xni | > ε ∩ |Yni | > δ}

+ P {|Yni Xni | > ε ∩ |Yni | ≤ δ}
≤ P {|Yni | > δ} + P {|Xni | > ε/δ}.
As Yni = OP (1), for arbitrary η > 0, there exists κ > 0 such that
P {|Yni | > κ} < η for all n ∈ N. Choosing δ := κ, we finally attain
lim P {|Yni Xni | > ε} = 0,

n→∞
since η was chosen arbitrarily.

Lemma 7.3.2. Consider a sequence of real valued random k-vectors
(Yn )n∈N , where all Yn := (Yn1 , . . . , Ynk )T are defined on the same prob-
ability space (Ω, A, P). If Yn = O P (hn ), where hn → 0 as n → ∞,
then Yn = oP (1).
Proof. For any fixed i ∈ {1, . . . , k}, we have Yni /hn = OP (1). Thus,
for every ε > 0, there exists δ > 0 with P {|Yni /hn | > δ} = P {|Yni | >
δ|hn |} < ε for all n. Now, for arbitrary η > 0, there exits N ∈ N such
that δ|hn | ≤ η for all n ≥ N . Hence, we have P {|Yni | > η} → 0 as
n → ∞ for all η > 0.
Lemma 7.3.3. Let Y be a random variable on (Ω, A, P) with cor-
responding distribution function F and (Yn )n∈N is supposed to be a
sequence of random variables on (Ω, A, P) with corresponding distri-
D
bution functions Fn such that Yn → Y as n → ∞. Then, Yn = OP (1).
Proof. Let ε > 0. Due to the right-continuousness of distribution
functions, we always find continuity points t1 and −t1 of F satisfying
F (t1 ) > 1 − ε/4 and F (−t1 ) < ε/4. Moreover, there exists N1 such
that |Fn (t1 ) − F (t1 )| < ε/4 for n ≥ N1 , entailing Fn (t1 ) > 1 − ε/2 for
n ≥ N1 . Analogously, we find N2 such that Fn (−t1 ) < ε/2 for n ≥ N2 .
Consequently, P {|Yn | > t1 } < ε for n ≥ max{N1 , N2 }. Since we
always find a continuity point t2 of F with maxn<max{N1 ,N2 } {P {|Yn | >
t2 }} < ε, the boundedness follows: P (|Yn | > t) < ε for all n, where
t := max{t1 , t2 }.
Lemma 7.3.4. Consider two sequences (Yn )n∈N and (Xn )n∈N of real
valued random k-vectors, all defined on the same probability space
P
(Ω, A, P), such that Yn − Xn = oP (1). If furthermore Yn → Y as
P
n → ∞, then Xn → Y as n → ∞.
Proof. The triangle inequality immediately provides
|Xn − Y | ≤ |Xn − Yn | + |Yn − Y | → 0 as n → ∞.
Note that Yn − Y = oP (1) ⇔ |Yn − Y | = oP (1), due to the Euclidean

distance.
Taylor Series Expansion in Probability

Lemma 7.3.5. Let (Yn )n∈N , Yn := (Yn1 , . . . , Ynk ), be a sequence of
real valued random k-vectors, all defined on the same probability space
P
(Ω, A, P), and let c be a constant vector in Rk such that Yn → c as
n → ∞. If the function ϑ : Rk → Rm is continuous at c, then
P
ϑ(Yn ) → ϑ(c) as n → ∞.
Proof. Let ε > 0. Since ϑ is continuous at c, there exists δ > 0 such

that, for all n,
{|Yn − c| < δ} ⊆ {|ϑ(Yn ) − ϑ(c)| < ε},
which implies
P {|ϑ(Yn ) − ϑ(c)| > ε} ≤ P {|Yn − c| > δ/2} → 0 as n → ∞.
If we assume in addition the existence of the partial derivatives of ϑ

in a neighborhood of c, we attain a stochastic analogue of the Taylor
series expansion.
Theorem 7.3.6. Let (Yn )n∈N , Yn := (Yn1 , . . . , Ynk ), be a sequence of

real valued random k-vectors, all defined on the same probability space
(Ω, A, P), and let c := (c1 , . . . , ck )T be an arbitrary constant vector in
Rk such that Yn − c = O P (hn ), where hn → 0 as n → ∞. If the
function ϑ : Rk → R, y → ϑ(y), has continuous partial derivatives

∂ϑ
∂yi , i = 1, . . . , k, in a neighborhood of c, then
k
X ∂ϑ
ϑ(Yn ) = ϑ(c) + (c)(Yni − ci ) + oP (hn ).
i=1
∂y i
Proof. The Taylor series expansion (e.g. Seeley, 1970, Section 5.3)
gives, as y → c,
k
X ∂ϑ
ϑ(y) = ϑ(c) + (c)(yi − ci ) + o(|y − c|),
i=1
∂y i
where y := (y1 , . . . , yk )T . Defining

k
1 X ∂ϑ o(|y − c|)
ϕ(y) : = ϑ(y) − ϑ(c) − (c)(yi − ci ) =
|y − c| i=1
∂yi |y − c|
for y 6= c and ϕ(c) = 0, we attain a function ϕ : Rk → R that is

continuous at c. Since Yn − c = O P (hn ), where hn → 0 as n → ∞,
Lemma 7.3.2 and the definition of stochastically boundedness directly
P
imply Yn → c. Together with Lemma 7.3.5, we receive therefore
P
ϕ(Yn ) → ϕ(c) = 0 as n → ∞. Finally, from Lemma 7.3.1 and
Exercise 7.1, the assertion follows: ϕ(Yn )|Yn − c| = oP (hn ).
The Delta-Method
Theorem 7.3.7. (The Multivariate Delta-Method) Consider a se-
quence of real valued random k-vectors (Yn )n∈N , all defined on the
D
same probability space (Ω, A, P), such that h−1 n (Yn − µ) → N (0, Σ)
as n → ∞ with µ := (µ, . . . , µ)T ∈ Rk , hn → 0 as n → ∞ and
Σ := (σrs )1≤r,s≤k being a symmetric and positive definite k ×k-matrix.
Moreover, let ϑ = (ϑ1 , . . . , ϑm )T : y → ϑ(y) be a function from Rk
into Rm , m ≤ k, where each ϑj , 1 ≤ j ≤ m, is continuously differen-
tiable in a neighborhood of µ. If
∂ϑ
j
∆ := (y)

∂yi µ 1≤j≤m,1≤i≤k
is a m × k-matrix with rank(∆) = m, then

D
h−1 T
n (ϑ(Yn ) − ϑ(µ)) → N (0, ∆Σ∆ )
as n → ∞.
Proof. By Lemma 7.3.3, Yn − µ = O P (hn ). Hence, the conditions of

Theorem 7.3.6 are satisfied and we receive
k
X ∂ϑj
ϑj (Yn ) = ϑj (µ) + (µ)(Yni − µi ) + oP (hn )
i=1
∂yi
for j = 1, . . . , m. Conveniently written in matrix notation yields
ϑ(Yn ) − ϑ(µ) = ∆(Yn − µ) + oP (hn )
or, respectively,
h−1 −1
n (ϑ(Yn ) − ϑ(µ)) = hn ∆(Yn − µ) + oP (1).
D
We know h−1 T
n ∆(Yn − µ) → N (0, ∆Σ∆ ) as n → ∞, thus, we con-
D
clude from Lemma 7.2.8 that h−1 T
n (ϑ(Yn ) − ϑ(µ)) → N (0, ∆Σ∆ ) as
n → ∞ as well.
Asymptotic Normality of Autocorrelation Estimator

Since ĉn (k) = n1 n−k
P
t=1 (Yt+k − Ȳ )(Yt − Ȳ ) is an estimator of the auto-
covariance γ(k) at lag k ∈ N, r̂n (k) := ĉn (k)/ĉn (0), ĉn (0) 6= 0, is an
obvious estimator of the autocorrelation ρ(k). As the direct deriva-
tion of an asymptotic distribution of ĉn (k) or r̂n (k) is a complex prob-
lem, we consider
Pn another estimator of γ(k). Lemma 7.2.13 motivates
1
γ̃n (k) := n t=1 Yt Yt+k as an appropriate candidate.
Lemma 7.3.8. Consider the stationary process Yt = ∞

P
u=−∞ bu εt−u ,
where the filter (bu )u∈Z is absolutely summable and (εt )t∈Z is a white
noise process of independent and identically distributed random vari-
ables with expectation E(εt ) = 0 and variance E(ε2t ) = σ 2 > 0. If
E(ε4t ) := ασ 4 < ∞, α > 0, then, for l ≥ 0, k ≥ 0 and n ∈ N,

lim n Cov(γ̃n (k), γ̃n (l))
n→∞
= (α − 3)γ(k)γ(l)
X∞
+ γ(m + l)γ(m − k) + γ(m + l − k)γ(m) ,
m=−∞
1
Pn
where γ̃n (k) = n t=1 Yt Yt+k .
Proof. The autocovariance function of (Yt )t∈Z is given by
∞
X ∞
X
γ(k) = E(Yt Yt+k ) = bu bw E(εt−u εt+k−w )
u=−∞ w=−∞
∞
X
= σ2 bu bu+k .
u=−∞
Observe that

4
ασ
 if g = h = i = j,
E(εg εh εi εj ) = σ 4 if g = h 6= i = j, g = i 6= h = j, g = j 6= h = i,

0 elsewhere.
We therefore find
E(Yt Yt+s Yt+s+r Yt+s+r+v )
X ∞ X∞ X∞ X∞
= bg bh+s bi+s+r bj+s+r+v E(εt−g εt−h εt−i εt−j )
g=−∞ h=−∞ i=−∞ j=−∞
X∞ X∞
4
= σ (bg bg+s bi+s+r bi+s+r+v + bg bg+s+r bi+s bi+s+r+v
g=−∞ i=−∞
+ bg bg+s+r+v bi+s bi+s+r )
X∞
4
+ (α − 3)σ bj bj+s bj+s+r bj+s+r+v
j=−∞
∞
X
= (α − 3)σ 4 bg bg+s bg+s+r bg+s+r+v
g=−∞
+ γ(s)γ(v) + γ(r + s)γ(r + v) + γ(r + s + v)γ(r).
Applying the result to the covariance of γ̃n (k) and γ̃n (l) provides
Cov(γ̃n (k), γ̃n (l))

= E(γ̃n (k)γ̃n (l)) − E(γ̃n (k)) E(γ̃n (l))
n n
!
1 X X
= 2E Ys Ys+k Yt Yt+l − γ(l)γ(k)
n s=1 t=1
n n
1 XX
= 2 γ(t − s)γ(t − s − k + l)
n s=1 t=1
+ γ(t − s + l)γ(t − s − k) + γ(k)γ(l)
X∞
4
+ (α − 3)σ bg bg+k bg+t−s bg+t−s+l − γ(l)γ(k). (7.23)
g=−∞
Since the two indices t and s occur as linear combination t − s, we

can apply the following useful form
n X
X n
Ct−s = nC0 + (n − 1)C1 + (n − 2)C2 · · · + Cn−1
s=1 t=1
+ (n − 1)C−1 + (n − 2)C−2 · · · + C1−n
X
= (n − |m|)Cm .
|m|≤n
Defining
Ct−s := γ(t − s)γ(t − s − k + l) + γ(t − s + l)γ(t − s − k)

X∞
4
+ (α − 3)σ bg bg+k bg+t−s bg+t−s+l ,
g=−∞
we can conveniently rewrite the above covariance (7.23) as

n n
1 XX 1 X
Cov(γ̃n (k), γ̃n (l)) = 2 Ct−s = 2 (n − |m|)Cm .
n s=1 t=1 n
|m|≤n
The absolutely summable filter (bu )u∈Z entails the absolute summa-
bility of the sequence (Cm )m∈Z . Hence, by the dominated convergence
theorem, it finally follows

∞
X
lim n Cov(γ̃n (k), γ̃n (l)) = Cm
n→∞
m=−∞
∞
X
= γ(m)γ(m − k + l) + γ(m + l)γ(m − k)
m=−∞
∞
X
4
+ (α − 3)σ bg bg+k bg+m bg+m+l
g=−∞
∞
X
= (α − 3)γ(k)γ(l) + γ(m)γ(m − k + l) + γ(m + l)γ(m − k) .
m=−∞
Lemma 7.3.9. Consider the stationary process (Yt )t∈Z from the previ-
ous Lemma 7.3.8 satisfying E(ε4t ) := ασ 4 < ∞, α > 0, b0 = 1 and bu =
0 for u < 0. Let γ̃p,n := (γ̃n (0), . . . , γ̃n (p))T , γp := (γ(0), . . . , γ(p))T
for p ≥ 0, n ∈ N, and let the p × p-matrix Σc := (ckl )0≤k,l≤p be given
by
ckl := (α − 3)γ(k)γ(l)
X∞
+ γ(m)γ(m − k + l) + γ(m + l)γ(m − k) .
m=−∞
(q) Pq
Furthermore, consider the MA(q)-process Yt := u=0 bu εt−u , q ∈
N, t ∈ Z, with corresponding autocovariance function γ(k)(q) and the
(q) (q)
p × p-matrix Σc := (ckl )0≤k,l≤p with elements
(q)
ckl :=(α − 3)γ(k)(q) γ(l)(q)
∞
X
(q) (q) (q) (q)
+ γ(m) γ(m − k + l) + γ(m + l) γ(m − k) .
m=−∞
(q)
Then, if Σc and Σc are regular,
√ D
n(γ̃p,n − γp ) → N (0, Σc ),
as n → ∞.
(q)
Proof. Consider the MA(q)-process Yt := qu=0 bu εt−u , q ∈ N, t ∈
P
Z, with corresponding autocovariance function γ(k)(q) . We define
(q) (q) (q)
γ̃n (k)(q) := n1 nt=1 Yt Yt+k as well as γ̃p,n := (γ̃n (0)(q) , . . . , γ̃n (p)(q) )T .
P
Defining moreover
(q) (q) (q) (q) (q) (q) (q)
Yt := (Yt Yt , Yt Yt+1 , . . . , Yt Yt+p )T , t ∈ Z,
(q)
we attain a strictly stationary (q+p)-dependence sequence (λT Yt )t∈Z
for any λ := (λ1 , . . . , λp+1 )T ∈ Rp+1 , λ 6= 0. Since
n
1 X (q) (q)
Yt = γ̃p,n ,
n t=1
the previous Lemma 7.3.8 gives

n
1 X
T (q)
lim n Var λ Yt = λT Σ(q)
c λ > 0.
n→∞ n t=1
Now, we can apply Theorem 7.2.16 and receive

n
(q) D
X
−1/2
n λT Yt − n1/2 λT γp(q) → N (0, λT Σ(q)
c λ)
t=1
(q)
as n → ∞, where γp := (γ(0)(q) , . . . , γ(p)(q) )T . Since, entrywise,
(q)
Σc → Σc as q → ∞, the dominated convergence theorem gives
n
D
X
−1/2
n λT Yt − n1/2 λT γp → N (0, λT Σc λ)
t=1
as n → ∞. The Cramer–Wold device then provides

√ D
n(γ̃p,n − γp ) → Y
as n → ∞, where Y is N (0, Σc )-distributed. Recalling once more

Theorem 7.2.7, it remains to show
√
lim lim sup P { n|γ̃n (k)(q) − γ(k)(q) − γ̃n (k) + γ(k)| > ε} = 0
q→∞ n→∞
(7.24)
for every ε > 0 and k = 0, . . . , p. We attain, by Chebychev’s inequal-

ity,
√
P { n|γ̃n (k)(q) − γ(k)(q) − γ̃n (k) + γ(k)| ≥ ε}
n
≤ 2 Var(γ̃n (k)(q) − γ̃n (k))
ε
1 (q) (q)

= 2 n Var(γ̃n (k)) + n Var(γ̃n (k)) − 2n Cov(γ̃n (k) γ̃n (k)) .
ε
By the dominated convergence theorem and Lemma 7.3.8, the first
term satisfies
lim lim n Var(γ̃n (k)(q) ) = lim n Var(γ̃n (k)) = ckk ,

q→∞ n→∞ n→∞
the last one
lim lim 2n Cov(γ̃n (k)(q) γ̃ ( k)) = 2ckk ,

q→∞ n→∞
showing altogether (7.24).

Lemma 7.3.10. Consider the stationary process and the notations
(q)
from Lemma 7.3.9, where the matrices Σc and Σc are assumed to
be positive definite. Let ĉn (s) = n1 n−k
P
Pn t=1 (Yt+s − Ȳ )(Yt − Ȳ ), with
1
Ȳ = n t=1 Yt , denote the autocovariance estimator at lag s ∈ N.
Then, for p ≥ 0,
√ D
n(ĉp,n − γp ) → N (0, Σc )
as n → ∞, where ĉp,n := (ĉn (0), . . . , ĉn (p))T .

√
Proof. In view of Lemma 7.3.9, it suffices to show that n(ĉn (k) −
P
γ̃n (k)) → 0 as n → ∞ for k = 0, . . . , p, since then Lemma 7.2.8
provides the assertion. We have
n−k n
√ √ 1 X 1X
n(ĉn (k) − γ̃n (k)) = n (Yt+k − Ȳ )(Yt − Ȳ ) − Yt Yt+k
n t=1 n t=1
n−k n
√ n − k 1X 1 X
= nȲ Ȳ − (Yt+k + Yt ) − √ Yt+k Yt .
n n t=1 n
t=n−k+1
√ P P∞
We know from Theorem 7.2.18 that either nȲ → 0, if u=0 bu =0
or
∞
√ D

2
X 2
nȲ → N 0, σ bu
u=0
P∞
as
√ n → ∞, if u=0 bu 6= 0. This entails the boundedness in probability
nȲ = OP (1) by Lemma 7.3.3. By Markov’s inequality, it follows,
for every ε > 0,
n 1 n
X o 1 1 n
X
P √ Yt+k Yt ≥ ε ≤ E √ Yt+k Yt

n ε n
t=n−k+1 t=n−k+1
k
≤ √ γ(0) → 0 as n → ∞,
ε n
Pn P
which shows that n−1/2 t=n−k+1 Yt+k Yt → 0 as n → ∞. Applying
Lemma 7.2.12 leads to
n−k
n−k 1X P
Ȳ − (Yt+k + Yt ) → 0 as n → ∞.
n n t=1
The required condition

√ P
n(ĉn (k) − γ̃n (k)) → 0 as n → ∞
is then attained by Lemma 7.3.1.
Theorem 7.3.11. Consider the stationary process and the notations

(q)
from Lemma 7.3.10, where the matrices Σc and Σc are assumed
to be positive definite. Let ρ(k) be the autocorrelation function of
(Yt )t∈Z and ρp := (ρ(1), . . . , ρ(p))T the autocorrelation vector in Rp .
Let furthermore r̂n (k) := ĉn (k)/ĉn (0) be the autocorrelation estimator
of the process for sufficiently large n with corresponding estimator
vector r̂p,n := (r̂n (1), . . . , r̂n (p))T , p > 0. Then, for sufficiently large
n,
√ D
n(r̂p,n − ρp ) → N (0, Σr )
as n → ∞, where Σr is the covariance matrix (rij )1≤i,j≤p with entries

∞
X
rij = 2ρ(m) ρ(i)ρ(j)ρ(m) − ρ(i)ρ(m + j)
m=−∞

− ρ(j)ρ(m + i) + ρ(m + j) ρ(m + i) + ρ(m − i) . (7.25)
Proof. Note that r̂n (k) is well defined for sufficiently large n since
P
ĉn (0) → γ(0) = σ 2 ∞ 2
P
u=−∞ bu > 0 as n → ∞ by (7.14). Let ϑ be
x
the function defined by ϑ((x0 , x1 , . . . , xp )T ) = ( xx01 , xx02 , . . . , xp0 )T , where
xs ∈ R for s ∈ {0, . . . , p} and x0 6= 0. The multivariate delta-method
7.3.7 and Lemma 7.3.10 show that
    
ĉn (0) γ(0)
√ √
n ϑ  ...  − ϑ  ...  = n(r̂p,n − ρp ) → N 0, ∆Σc ∆T
D
ĉn (p) γ(p)
as n → ∞, where the p × p + 1-matrix ∆ is given by the block matrix

1
(δij )1≤i≤p,0≤j≤p := ∆ = −ρp Ip
γ(0)
with p × p-identity matrix Ip .
The (i, j)-element rij of Σr := ∆Σc ∆T satisfies
p
X p
X
rij = δik ckl δjl
k=0 l=0
p
1 X
= − ρ(i) (c0l δjl + cil δjl )
γ(0)2
l=0
1
= ρ(i)ρ(j)c00 − ρ(i)c0j − ρ(j)c10 + c11
γ(0)2
X ∞
= ρ(i)ρ(j) (α − 3) + 2ρ(m)2
m=−∞
∞
X
0 0
− ρ(i) (α − 3)ρ(j) + 2ρ(m )ρ(m + j)
m0 =−∞
X ∞
∗ ∗
− ρ(j) (α − 3)ρ(i) + 2ρ(m )ρ(m − i) + (α − 3)ρ(i)ρ(j)
m∗ =−∞
∞
X
+ ρ(m°)ρ(m° − i + j) + ρ(m° + j)ρ(m° − i)
m°=−∞
∞
X
= 2ρ(i)ρ(j)ρ(m)2 − 2ρ(i)ρ(m)ρ(m + j)
m=−∞

− 2ρ(j)ρ(m)ρ(m − i) + ρ(m)ρ(m − i + j) + ρ(m + j)ρ(m − i) .
(7.26)
We may write
∞
X ∞
X
ρ(j)ρ(m)ρ(m − i) = ρ(j)ρ(m + i)ρ(m)
m=−∞ m=−∞
as well as
∞
X ∞
X
ρ(m)ρ(m − i + j) = ρ(m + i)ρ(m + j). (7.27)
m=−∞ m=−∞
Now, applied to (7.26) yields the representation (7.25).

The representation (7.25) is the so-called Bartlett’s formula, which

can be more conveniently written as
∞
X
rij = ρ(m + i) + ρ(m − i) − 2ρ(i)ρ(m)
m=1

· ρ(m + j) + ρ(m − j) − 2ρ(j)ρ(m) (7.28)
using (7.27).
Remark 7.3.12. The derived distributions in the previous Lemmata

and Theorems remain valid even without the assumption of regular
(q)
matrices Σc and Σc (Brockwell and Davis, 2002, Section 7.2 and
7.3). Hence, we may use above formula for ARMA(p, q)-processes
that satisfy the process conditions of Lemma 7.3.9.
7.4 First Examinations

The Box–Jenkins program was introduced in Section 2.3 and deals
with the problem of selecting an invertible and zero-mean ARMA(p, q)-
model that satisfies the stationary condition and Var(Yt ) > 0 for
the purpose of appropriately explaining, estimating and forecasting
univariate time series. In general, the original time series has to
be prepared in order to represent a possible realization of such an
ARMA(p, q)-model satisfying above conditions. Given an adequate
ARMA(p, q)-model, forecasts of the original time series can be at-
tained by reversing the applied modifications.
In the following sections, we will apply the program to the Donauwo-
erth time series y1 , . . . , yn .
First, we have to check, whether there is need to eliminate an occur-
ring trend, seasonal influence or spread variation over time. Also, in
general, a mean correction is necessary in order to attain a zero-mean
time series. The following plots of the time series and empirical auto-
correlation as well as the periodogram of the time series provide first
indications.
7.4 First Examinations 273
The MEANS Procedure
Analysis Variable : discharge
N Mean Std Dev Minimum Maximum

--------------------------------------------------------------------
7300 201.5951932 117.6864736 54.0590000 1216.09
--------------------------------------------------------------------
Listing 7.4.1: Summary statistics of the original Donauwoerth Data.
Plot 7.4.1b: Plot of the original Donauwoerth Data.

Plot 7.4.1c: Empirical autocorrelation function of original Donauwo-

erth Data.
Plot 7.4.1d: Periodogram of the original Donauwoerth Data.
PERIOD COS_01 SIN_01 p lambda
365.00 7.1096 51.1112 4859796.14 .002739726

2433.33 -37.4101 20.4289 3315765.43 .000410959

456.25 -1.0611 25.8759 1224005.98 .002191781
Listing 7.4.1e: Greatest periods inherent in the original Donauwoerth
Data.
1 /* donauwoerth_firstanalysis.sas */
2 TITLE1 ’First Analysis’;
3 TITLE2 ’Donauwoerth Data’;
4
5 /* Read in data set */
6 DATA donau;
7 INFILE ’/scratch/perm/stat/daten/donauwoerth.txt’;
8 INPUT day month year discharge;
9 date=MDY(month, day, year);
10 FORMAT date mmddyy10.;
11
12 /* Compute mean */
13 PROC MEANS DATA=donau;
14 VAR discharge;
15 RUN;
16
18 SYMBOL1 V=DOT I=JOIN C=GREEN H=0.3 W=1;
19 AXIS1 LABEL=(ANGLE=90 ’Discharge’);
20 AXIS2 LABEL=(’January 1985 to December 2004’) ORDER=(’01JAN85’d ’01
,→JAN89’d ’01JAN93’d ’01JAN97’d ’01JAN01’d ’01JAN05’d);
21 AXIS3 LABEL=(ANGLE=90 ’Autocorrelation’);
22 AXIS4 LABEL=(’Lag’) ORDER = (0 1000 2000 3000 4000 5000 6000 7000);
23 AXIS5 LABEL=(’I(’ F=CGREEK ’l)’);
25
26 /* Generate data plot */
27 PROC GPLOT DATA=donau;
28 PLOT discharge*date=1 / VREF=201.6 VAXIS=AXIS1 HAXIS=AXIS2;
29 RUN;
30
31 /* Compute and plot empirical autocorrelation */
32 PROC ARIMA DATA=donau;
33 IDENTIFY VAR=discharge NLAG=7000 OUTCOV=autocorr NOPRINT;
34 PROC GPLOT DATA=autocorr;
35 PLOT corr*lag=1 /VAXIS=AXIS3 HAXIS=AXIS4 VREF=0;
36 RUN;
37
38 /* Compute periodogram */
39 PROC SPECTRA DATA=donau COEF P OUT=data1;
40 VAR discharge;
41
43 DATA data2;
45 p=P_01/2;
46 lambda=FREQ/(2*CoNSTANT(’PI’));
47 DROP P_01 FREQ;

48
49 /* Plot periodogram */
52 RUN;
53
54 /* Print three largest periodogram values */
55 PROC SORT DATA=data2 OUT=data3; BY DESCENDING p;
57 RUN; QUIT;
In the DATA step the observed measurements indicating the arithmetic mean of the variable
of discharge as well as the dates of observa- discharge, which was computed by PROC
tions are read from the external file ’donau- MEANS. Next, PROC ARIMA computes the em-
woerth.txt’ and written into corresponding vari- pirical autocorrelations up to a lag of 7000 and
ables day, month, year and discharge. PROC SPECTRA computes the periodogram of
The MDJ function together with the FORMAT the variable discharge as described in the
statement then creates a variable date con- previous chapters. The corresponding plots are
sisting of month, day and year. The raw dis- again created with PROC GPLOT. Finally, PROC
charge values are plotted by the procedure SORT sorts the values of the periodogram in de-
PROC GPLOT with respect to date in order to creasing order and the three largest ones are
obtain a first impression of the data. The op- printed by PROC PRINT, due to option OBS=3.
tion VREF creates a horizontal line at 201.6,
The plot of the original Donauwoerth time series indicate no trend,

but the plot of the empirical autocorrelation function indicate seasonal
variations. The periodogram shows a strong influence of the Fourier
frequencies 0.00274 ≈ 1/365 and 0.000411 ≈ 1/2433, indicating cycles
with period 365 and 2433.
Carrying out a seasonal and mean adjustment as presented in Chap-
ter 1 leads to an apparently stationary shape, as we can see in the
following plots of the adjusted time series, empirical autocorrelation
function and periodogram. Furthermore, the variation of the data
seems to be independent of time. Hence, there is no need to apply
variance stabilizing methods.
Next, we execute Dickey–Fuller’s test for stationarity as introduced in
Section 2.2. Since we assume an invertible ARMA(p, q)-process, we
approximated it by high-ordered autoregressive models with orders of
7, 8 and 9. Under the three model assumptions 2.18, 2.19 and 2.20,
the test of Dickey-Fuller finally provides significantly small p-values
for each considered case, cf. Listing 7.4.2d. Accordingly, we have no
reason to doubt that the adjusted zero-mean time series, which will
henceforth be denoted by ỹ1 , . . . , ỹn , can be interpreted as a realization

of a stationary ARMA(p, q)-process.
Simultaneously, we tested, if the adjusted time series can be regarded
as an outcome of a white noise process. Then there would be no
need to fit a model at all. The plot of the empirical autocorrelation
function 7.4.2b clearly rejects this assumption. A more formal test for
white noise can be executed, using for example the Portmanteau-test
of Box–Ljung, cf. Section 7.6.
The MEANS Procedure
Analysis Variable : discharge
N Mean Std Dev Minimum Maximum

--------------------------------------------------------------------
7300 201.5951932 117.6864736 54.0590000 1216.09
--------------------------------------------------------------------
Listing 7.4.2: Summary statistics of the adjusted Donauwoerth Data.
Plot 7.4.2b: Plot of the adjusted Donauwoerth Data.

Plot 7.4.2c: Empirical autocorrelation function of adjusted Donauwo-

erth Data.
Plot 7.4.2d: Periodogram of adjusted Donauwoerth Data.
Augmented Dickey-Fuller Unit Root Tests
Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 7 -69.1526 <.0001 -5.79 <.0001

8 -63.5820 <.0001 -5.55 <.0001
9 -58.4508 <.0001 -5.32 <.0001
Single Mean 7 -519.331 0.0001 -14.62 <.0001 106.92 0.0010
8 -497.216 0.0001 -14.16 <.0001 100.25 0.0010
9 -474.254 0.0001 -13.71 <.0001 93.96 0.0010
Trend 7 -527.202 0.0001 -14.71 <.0001 108.16 0.0010
8 -505.162 0.0001 -14.25 <.0001 101.46 0.0010
9 -482.181 0.0001 -13.79 <.0001 95.13 0.0010
Listing 7.4.2e: Augmented Dickey-Fuller Tests for stationarity.
1 /* donauwoerth_adjustedment.sas */
2 TITLE1 ’Analysis of Adjusted Data’;
4
5 /* Remove period of 2433 and 365 days */
6 PROC TIMESERIES DATA=donau SEASONALITY=2433 OUTDECOMP=seasad1;
7 VAR discharge;
8 DECOMP / MODE=ADD;
9
10 PROC TIMESERIES DATA=seasad1 SEASONALITY=365 OUTDECOMP=seasad;

11 VAR sa;
12 DECOMP / MODE=ADD;
13 RUN;
14
15 /* Create zero-mean data set */
16 PROC MEANS DATA=seasad;
17 VAR sa;
18
19 DATA seasad;
20 MERGE seasad donau(KEEP=date);
21 sa=sa-201.6;
22
24 SYMBOL1 V=DOT I=JOIN C=GREEN H=0.3;
25 SYMBOL2 V=NONE I=JOIN C=BLACK L=2;
26 AXIS1 LABEL=(ANGLE=90 ’Adjusted Data’);
27 AXIS2 LABEL=(’January 1985 to December 2004’)
28 ORDER=(’01JAN85’d ’01JAN89’d ’01JAN93’d ’01JAN97’d ’01JAN01’d
,→’01JAN05’d);
29 AXIS3 LABEL=(’I’ F=CGREEK ’(l)’);
31 AXIS5 LABEL=(ANGLE=90 ’Autocorrelation’) ORDER=(-0.2 TO 1 BY 0.1);
32 AXIS6 LABEL=(’Lag’) ORDER=(0 TO 1000 BY 100);
33 LEGEND1 LABEL=NONE VALUE=(’Autocorrelation of adjusted series’ ’Lower
,→99-percent confidence limit’ ’Upper 99-percent confidence limit’)
,→;
34
35 /* Plot data */
36 PROC GPLOT DATA=seasad;
37 PLOT sa*date=1 / VREF=0 VAXIS=AXIS1 HAXIS=AXIS2;
38 RUN;
39
40 /* Compute and plot autocorrelation of adjusted data */
41 PROC ARIMA DATA=seasad;
42 IDENTIFY VAR=sa NLAG=1000 OUTCOV=corrseasad NOPRINT;
43
44 /* Add confidence intervals */
45 DATA corrseasad;
46 SET corrseasad;
47 u99=0.079;
48 l99=-0.079;
49
50 PROC GPLOT DATA=corrseasad;
51 PLOT corr*lag=1 u99*lag=2 l99*lag=2 / OVERLAY VAXIS=AXIS5 HAXIS=
,→AXIS6 VREF=0 LEGEND=LEGEND1;
52 RUN;
53
54 /* Compute periodogram of adjusted data */
55 PROC SPECTRA DATA=seasad COEF P OUT=data1;
56 VAR sa;
57
58 /* Adjust different periodogram definitions */
59 DATA data2;
61 p=P_01/2;
62 lambda=FREQ/(2*CoNSTANT(’PI’));
63 DROP P_01 FREQ;
64
65 /* Plot periodogram of adjusted data */
68 RUN;
69
70 /* Test for stationarity */
72 IDENTIFY VAR=sa NLAG=100 STATIONARITY=(ADF=(7,8,9));
73 RUN; QUIT;
The seasonal influences of the periods of riodogram are created from the adjusted data.
both 365 and 2433 days are removed by The confidence interval I := [−0.079, 0.079]
the SEASONALITY option in the procedure pertaining to the 3σ-rule is displayed by the
PROC TIMESERIES. By MODE=ADD, an addi- variables l99 and u99.
tive model for the time series is assumed. The The final part of the program deals with the test
mean correction of -201.6 finally completes the for stationarity. The augmented Dickey-Fuller
adjustment. The dates of the original Donau- test is initiated by ADF in the STATIONARITY
woerth dataset need to be restored by means option in PROC ARIMA. Since we deal with true
of MERGE within the DATA statement. Then, the ARMA(p, q)-processes, we have to choose a
same steps as in the previous Program 7.4.1 high-ordered autoregressive model, whose or-
(donauwoerth firstanalysis.sas) are executed, der selection is specified by the subsequent
i.e., the general plot as well as the plots of numbers 7, 8 and 9.
the empirical autocorrelation function and pe-
In a next step we analyze the empirical partial autocorrelation func-

tion α̂(k) and the empirical autocorrelation function r(k) as esti-
mations of the theoretical partial autocorrelation function α(k) and
autocorrelation function ρ(k) of the underlying stationary process.
As shown before, both estimators become less reliable as k → n,
k < n = 7300.
Given a MA(q)-process satisfying the conditions of Lemma 7.3.9, we
attain, using Bartlett’s formula (7.28), for i > q and n sufficiently
large,
∞
1 1X 2 1 2 2 2

rii = ρ(m − i) = 1 + 2ρ(1) + 2ρ(2) · · · + 2ρ(q)
n n m=1 n
(7.29)
as asymptotic variance of the autocorrelation estimators r̂n (i). Thus,

if the zero-mean time series ỹ1 , . . . , ỹn is an outcome of the MA(q)-
process with a sufficiently large sample size n, then we can approxi-
mate above variance by
2 1 2 2 2

σq := 1 + 2r(1) + 2r(2) · · · + 2r(q) . (7.30)
n
Since r̂n (i) is asymptotically normal distributed (see Section 7.3), we
will reject the hypothesis H0 that actually a MA(q)-process is underly-
ing the time series ỹ1 , . . . , ỹn , if significantly more than 1 − α percent
of the empirical autocorrelation coefficients with lag greater than q
lie outside of the confidence interval I := [−σq · q1−α/2 , σq · q1−α/2 ]
of level α, where q1−α/2 denotes the (1 − α/2)-quantile of the stan-
dard p normal distribution. Setting q = 10, we attain in our case
σq ≈ 5/7300 ≈ 0.026. By the 3σ rule, we would expect that almost
all empirical autocorrelation coefficients with lag greater than 10 will
be elements of the confidence interval J := [−0.079, 0.079]. Hence,
Plot 7.4.2b makes us doubt that we can express the time series by a
mere MA(q)-process with a small q ≤ 10.
The following figure of the empirical partial autocorrelation function
can mislead to the deceptive conclusion that an AR(3)-process might
explain the time series quite well. Actually, if an AR(p)-process un-

derlies the adjusted time series ỹ1 , . . . , ỹn with a sample size n of 7300,
then the asymptotic variances of the partial autocorrelation estima-
tors α̂n (k) with lag greater than p are equal to 1/7300, cf Theorem
7.2.21. Since α̂n (k) is asymptotically normal distributed, we attain
the 95-percent confidence interval I := [−0.0234, 0.0234], due to the
2σ rule. In view of the numerous quantity of empirical partial au-
tocorrelation coefficients lying outside of the interval I, summarized
in Table 7.4.1, we have no reason to choose mere AR(p)-processes as
adequate models.
Nevertheless, focusing on true ARMA(p, q)-processes, the shape of the
empirical partial autocorrelation might indicate that the autoregres-
sive order p will be equal to 2 or 3.
partial partial
lag lag
autocorrelation autocorrelation
1 0.92142 44 0.03368
2 -0.33057 45 -0.02412
3 0.17989 47 -0.02892
4 0.02893 61 -0.03667
5 0.03871 79 0.03072
6 0.05010 81 -0.03248
7 0.03633 82 0.02521
15 0.03937 98 -0.02534
Table 7.4.1: Greatest coefficients in absolute value of empirical partial

autocorrelation of adjusted time series.
Plot 7.4.3: Empirical partial autocorrelation function of adjusted

Donauwoerth Data.
1 /* donauwoerth_pacf.sas */
2 TITLE1 ’Partial Autocorrelation’;
4
5 /* Note that this program requires the file ’seasad’ generated by the
,→previous program (donauwoerth_adjustment.sas) */
6
8 SYMBOL1 V=DOT I=JOIN C=GREEN H=0.5;
9 SYMBOL2 V=NONE I=JOIN C=BLACK L=2;
10 AXIS1 LABEL=(ANGLE=90 ’Partial Autocorrelation’) ORDER=(-0.4 TO 1 BY
,→0.1);
11 AXIS2 LABEL=(’Lag’);
12 LEGEND2 LABEL=NONE VALUE=(’Partial autocorrelation of adjusted series’
13 ’Lower 95-percent confidence limit’ ’Upper 95-percent confidence
,→limit’);
14
15 /* Compute partial autocorrelation of the seasonal adjusted data */
17 IDENTIFY VAR=sa NLAG=100 OUTCOV=partcorr NOPRINT;
18 RUN;
19
20 /* Add confidence intervals */
21 DATA partcorr;
22 SET partcorr;
23 u95=0.0234;
24 l95=-0.0234;
25
26 /* Plot partial autocorrelation */

27 PROC GPLOT DATA=partcorr(OBS=100);
28 PLOT partcorr*lag=1 u95*lag=2 l95*lag=2 / OVERLAY VAXIS=AXIS1 HAXIS=
,→AXIS2 VREF=0 LEGEND=LEGEND2;
29 RUN;
30
31 /* Indicate greatest values of partial autocorrelation */
32 DATA gp(KEEP=lag partcorr);
33 SET partcorr;
34 IF ABS(partcorr) le 0.0234 THEN delete;
35 RUN; QUIT;
The empirical partial autocorrelation coeffi- is again visualized by two variables u95 and
cients of the adjusted data are computed by l95. The last DATA step finally generates a
PROC ARIMA up to lag 100 and are plotted af- file consisting of the empirical partial autocor-
terwards by PROC GPLOT. The confidence in- relation coefficients outside of the confidence
terval [−0.0234, 0.0234] pertaining to the 2σ-rule interval.
7.5 Order Selection

As seen in Section 2, every invertible ARMA(p, q)-process P satisfying
the stationarity
P condition can be both rewritten as Yt = v≥0 αv εt−v
and as εt = w≥0 βw Yt−w , almost surely. Thus, they can be approx-
imated by a high-order MA(q)- or AR(p)-process, respectively. Ac-
cordingly, we could entirely omit the consideration of true ARMA(p, q)-
processes. However, if the time series actually results from a true
ARMA(p, q)-process, we would only attain an appropriate approxi-
mation by admitting a lot of parameters.
Box et al. (1994) suggested the choose of the model orders by means of
the principle of parsimony, preferring the model with least amount of
parameters. Suppose that a MA(q 0 )-, an AR(p0 )- and an ARMA(p, q)-
model have been adjusted to a sample with almost equal goodness of
fit. Then, studies of Box and Jenkins have demonstrated that, in
general, p + q ≤ q 0 as well as p + q ≤ p0 . Thus, the ARMA(p, q)-model
requires the smallest quantity of parameters.
The principle of parsimony entails not only a comparatively small
number of parameters, which can reduce the efforts of estimating the
unknown model coefficients. In general, it also provides a more re-
liable forecast, since the forecast is based on fewer estimated model
7.5 Order Selection 285
coefficients.
To specify the order of an invertible ARMA(p, q)-model
Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q , t ∈ Z,

(7.31)
satisfying the stationarity condition with expectation E(Yt ) = 0 and

variance Var(Yt ) > 0 from a given zero-mean time series ỹ1 , . . . , ỹn ,
which is assumed to be a realization of the model, SAS offers sev-
eral methods within the framework of the ARIMA procedure. The
minimum information criterion, short MINIC method, the extended
sample autocorrelation function method (ESACF-method) and the
smallest canonical correlation method (SCAN-method) are provided
in the following sections. Since the methods depend on estimations,
it is always advisable to compare all three approaches.
Note that SAS only considers ARMA(p, q)-model satisfying the pre-
liminary conditions.
Information Criterions
To choose the orders p and q of an ARMA(p, q)-process one commonly
takes the pair (p, q) minimizing some information function, which is
based on the loglikelihood function.
For Gaussian ARMA(p, q)-processes we have derived the loglikelihood
function in Section 2.3
l(ϑ|y1 , . . . , yn )
n 1 1
= log(2πσ 2 ) − log(det Σ0 ) − 2 Q(ϑ|y1 , . . . , yn ). (7.32)
2 2 2σ
The maximum likelihood estimator ϑ̂ := (σ̂ 2 , µ̂, â1 , . . . , âp , b̂1 , . . . , b̂q ),
maximizing l(ϑ|y1 , . . . , yn ), can often be computed by deriving the
ordinary derivative and the partial derivatives of l(ϑ|y1 , . . . , yn ) and
equating them to zero. These are the so-called maximum likelihood
equations, which ϑ̂ necessarily has to satisfy. Holding σ 2 , a1 , . . . , ap , b1 ,
. . . , bq , fix we obtain
∂l(ϑ|y1 , . . . , yn ) 1 ∂Q(ϑ|y1 , . . . , yn )
=− 2
∂µ 2σ ∂µ
1 ∂
T 0 −1

=− 2 (y − µ) Σ (y − µ)
2σ ∂µ
1 ∂ T 0 −1 −1
=− 2 y Σ y + µT Σ 0 µ
2σ ∂µ

T 0 −1 T 0 −1
−µ·y Σ 1−µ·1 Σ y
1 −1 −1
=− 2
(2µ · 1T Σ0 1 − 2 · 1T Σ0 y),
2σ
where 1 := (1, 1, . . . , 1)T , y := (y1 , . . . , yn )T ∈ Rn . Equating the par-
tial derivative to zero yields finally the maximum likelihood estimator
µ̂ of µ
−1
1T Σ̂0 y
µ̂ = −1 , (7.33)
1T Σ̂0 1
where Σ̂0 equals Σ0 , where the unknown parameters a1 , . . . , ap and
b1 , . . . , bq are replaced by maximum likelihood estimators â1 , . . . , âp
and b̂1 , . . . , b̂q .
The maximum likelihood estimator σ̂ 2 of σ 2 can be achieved in an
analogous way with µ, a1 , . . . , ap , b1 , . . . , bq being held fix. From
∂l(ϑ|y1 , . . . , yn ) n Q(ϑ|y1 , . . . , yn )
2
=− 2+ ,
∂σ 2σ 2σ 4
we attain
Q(ϑ̂|y1 , . . . , yn )
σ̂ 2 = . (7.34)
n
The computation of the remaining maximum likelihood estimators
â1 , . . . , âp , b̂1 , . . . , b̂q is usually a computer intensive problem.
Since the maximized loglikelihood function only depends on the model
assumption Mp,q , lp,q (y1 , . . . , yn ) := l(ϑ̂|y1 , . . . , yn ) is suitable to cre-
ate a measure for comparing the goodness of fit of different models.
The greater lp,q (y1 , . . . , yn ), the higher the likelihood that y1 , . . . , yn
actually results from a Gaussian ARMA(p, q)-process.
Comparing a Gaussian ARMA(p∗ , q ∗ )- and ARMA(p0 , q 0 )-process, with

p∗ ≥ p0 , q ∗ ≥ q 0 , immediately yields lp∗ ,q∗ (y1 , . . . , yn ) ≥ lp0 ,q0 (y1 , . . . , yn ).
Thus, without modification, a decision rule based on the pure max-
imized loglikelihood function would always suggest comprehensive
models.
Akaike (1977) therefore investigated the goodness of fit of ϑ̂ in view
of a new realization (z1 , . . . , zn ) from the same ARMA(p, q)-process,
which is independent from (y1 , . . . , yn ). The loglikelihood function
(7.32) then provides
l(ϑ̂|y1 , . . . , yn )
n 1 1
= − log(2πσ̂ 2 ) − log(det Σ̂0 ) − 2 Q(ϑ̂|y1 , . . . , yn )
2 2 2σ̂
as well as
l(ϑ̂|z1 , . . . , zn )
n 1 1
= − log(2πσ̂ 2 ) − log(det Σ̂0 ) − 2 Q(ϑ̂|z1 , . . . , zn ).
2 2 2σ̂
This leads to the representation
l(ϑ̂|z1 , . . . , zn )
1
= l(ϑ̂|y1 , . . . , yn ) − Q(ϑ̂|z1 , . . . , z n ) − Q(ϑ̂|y 1 , . . . , y n ) . (7.35)
2σ̂ 2
The asymptotic distribution of Q(ϑ̂|Z1 , . . . , Zn ) − Q(ϑ̂|Y1 , . . . , Yn ),
where Z1 , . . . , Zn and Y1 , . . . , Yn are independent samples resulting
from the ARMA(p, q)-process, can be computed for n sufficiently
large, which reveals an asymptotic expectation of 2σ 2 (p + q) (see for
example Brockwell and Davis, 1991, Section 8.11). If we now replace
the term Q(ϑ̂|z1 , . . . , zn ) − Q(ϑ̂|y1 , . . . , yn ) in (7.35) by its approxi-
mately expected value 2σ̂ 2 (p + q), we attain with (7.34)
l(ϑ̂|z1 , . . . , zn ) = l(ϑ̂|y1 , . . . , yn ) − (p + q)
n 1 1
= − log(2πσ̂ 2 ) − log(det Σ̂0 ) − 2 Q(ϑ̂|y1 , . . . , yn ) − (p + q)
2 2 2σ̂
n n 1 n
= − log(2π) − log(σ̂ 2 ) − log(det Σ̂0 ) − − (p + q).
2 2 2 2
The greater the value of l(ϑ̂|z1 , . . . , zn ), the more precisely the esti-
mated parameters σ̂ 2 , µ̂, â1 , . . . , âp , b̂1 , . . . , b̂q reproduce the true under-
lying ARMA(p, q)-process. Due to this result, Akaike’s Information
Criterion (AIC) is defined as
2 1
AIC := − l(ϑ̂|z1 , . . . , zn ) − log(2π) − log(det Σ̂0 ) − 1
n n
2(p + q)
= log(σ̂ 2 ) +
n
for sufficiently large n, since then n−1 log(det Σ̂0 ) becomes negligible.
Thus, the smaller the value of the AIC, the greater the loglikelihood
function l(ϑ̂|z1 , . . . , zn ) and, accordingly, the more precisely ϑ̂ esti-
mates ϑ.
The AIC adheres to the principle of parsimony as the model orders
are included in an additional term, the so-called penalty function.
However, comprehensive studies came to the conclusion that the AIC
has the tendency to overestimate the order p. Therefore, modified cri-
terions based on the AIC approach were suggested, like the Bayesian
Information Criterion (BIC)
2 (p + q) log(n)
BIC(p, q) := log(σ̂p,q )+
n
and the Hannan-Quinn Criterion
2 2(p + q)c log(log(n))

HQ(p, q) := log(σ̂p,q )+ with c > 1,
n
for which we refer to Akaike (1977) and Hannan and Quinn (1979).
Remark 7.5.1. Although these criterions are derived under the as-
sumption of an underlying Gaussian process, they nevertheless can
tentatively identify the orders p and q, even if the original process is
not Gaussian. This is due to the fact that the criterions can be re-
garded as a measure of the goodness of fit of the estimated covariance
matrix Σ̂0 to the time series (Brockwell and Davis, 2002).
MINIC Method
The MINIC method is based on the previously introduced AIC and
BIC. The approach seeks the order pair (p, q) minimizing the BIC-
Criterion
2 (p + q) log(n)
BIC(p, q) = log(σ̂p,q )+
n
in a chosen order range of pmin ≤ p ≤ pmax and qmin ≤ q ≤ qmax ,
2
where σ̂p,q is an estimate of the variance Var(εt ) = σ 2 of the errors
εt in the model (7.31). In order to estimate σ 2 , we first have to esti-
mate the model parameters from the zero-mean time series ỹ1 , . . . , ỹn .
Since their computation by means of the maximum likelihood method
is generally very extensive, the parameters are estimated from the re-
gression model
ỹt = a1 ỹt−1 + · · · + ap ỹt−p + b1 ε̂t−1 + · · · + bq ε̂t−q + ε̂t (7.36)
for max{p, q} + 1 ≤ t ≤ n, where ε̂t is an estimator of εt . The least

squares approach, as introduced in Section 2.3, minimizes the residual
sum of squares
S 2 (a1 , . . . , ap , b1 , . . . , bq )
X n
= (ỹt − a1 ỹt−1 − · · · − ap ỹt−p − b1 ε̂t−1 − · · · − bq ε̂t−q )2
t=max{p,q}+1
Xn
= ε̂2t
t=max{p,q}+1
with respect to the parameters a1 , . . . , ap , b1 , . . . , bq . It provides min-

imizers â1 , . . . , âp , b̂1 , . . . , b̂q . Note that the residuals ε̂t and the con-
stants are nested. Then,
n
1 X
2
σ̂p,q := ε̃ˆ2t , (7.37)
n
t=max{p,q}+1
where
ε̃ˆt := ỹt − â1 ỹt−1 − · · · − âp ỹt−p − b̂1 ε̂t−1 − · · · − b̂q ε̂t−q ,
is an obvious estimate of σ 2 for n sufficiently large. Now, we want

to estimate the error sequence (εt )t∈Z . Due to the invertibility of the
ARMA(p, q)-model (7.31), we find P the almost surely autoregressive
representation of the errors εt = u≥0 cu Yt−u , which we can Ppεapproxi-
mate by a finite high-order autoregressive
Ppε process εt,pε := u=0 cu Yt−u
of order pε . Consequently, ε̂t := u=0 ĉu ỹt−u is an obvious estimate of
the error εt for 1 + pε ≤ t ≤ n, where the coefficients ĉi are obtained
from the Yule-Walker-estimate (2.8). The length pε of the autoregres-
sive model is determined by the order that minimizes the AIC
pε + 0
AIC(pε , 0) := log(σ̂p2ε ,0 ) + 2
n
in a chosen range of 0 < pε ≤ pε,max , where σ̂p2ε ,0 := n1 nt=1 ε̂2t with
P
ε̂t = 0 for t ∈ {1, . . . , pε } is again an obvious estimate of the error
variance Var(εt,pε ) for n sufficiently large.
Hence, we can apply above regression (7.36) for pε + 1 + max{p, q} ≤
t ≤ n and receive (7.37) as an obvious estimate of σ 2 , where ε̃ˆt = 0
for t = max{p, q} + 1, . . . , max{p, q} + pε .
ESACF Method
The second method provided by SAS is the ESACF method , devel-
oped by Tsay and Tiao (1984). The essential idea of this approach
is based on the estimation of the autoregressive part of the underly-
ing process in order to receive a pure moving average representation,
whose empirical autocorrelation function, the so-called extended sam-
ple autocorrelation function, will be close to 0 after the lag of order.
For this purpose, we utilize a recursion initiated by the best linear
(k) (k)
approximation with real valued coefficients Ŷt := λ0,1 Yt−1 + · · · +
(k)
λ0,k Yt−k of Yt in the model (7.31) for a chosen k ∈ N. Recall that the
coefficients satisfy
  (k) 
. . . ρ(k − 1) λ0,1
  
ρ(1) ρ(0)
.
 ..  =  .
.. ... .
..   ... 

 (7.38)
ρ(k) ρ(−k + 1) . . . ρ(0) (k)
λ0,k
and are uniquely determined by Theorem 7.1.1. The lagged residual
(k) (k) (k) (k)
Rt−1,0 , Rs,0 := Ys − λ0,1 Ys−1 − · · · − λ0,k Ys−k , s ∈ Z, is a linear combi-
nation of Yt−1 , . . . , Yt−k−1 . Hence, the best linear approximation of Yt

based on Yt−1 , . . . , Yt−k−1 equals the best linear approximation of Yt
(k)
based on Yt−1 , . . . , Yt−k , Rt−1,0 . Thus, we have
(k+1) (k+1)
λ0,1 Yt−1 + · · · + λ0,k+1 Yt−k−1
(k) (k) (k) (k)
= λ1,1 Yt−1 + · · · + λ1,k Yt−k + λ1,k+1 Rt−1,0
(k) (k) (k) (k) (k)
= λ1,1 Yt−1 + · · · + λ1,k Yt−k + λ1,k+1 (Yt−1 − λ0,1 Yt−2 − · · · − λ0,k Yt−k−1 ),
where the real valued coefficients consequently satisfy
 (k+1)   1 0 ... 0 1   (k) 
λ0,1 . λ1,1
(k+1) 0 1 .. −λ(k) (k)
 λ0,2   0,1   λ
 .   .. . . . . . . . . . .. ..   1,2

 ..   . . (k)
.   ..  . 
 λ(k+1)  =  1 0 −λ0,k−2   λ(k)  .
 0,k−1     1,k−1 
 λ(k+1)   ... (k)
1 −λ0,k−1
  λ(k) 
0,k 1,k
(k+1) (k) (k)
λ0,k+1 0 ... 0 −λ0,k λ1,k+1
Together with (7.38), we attain

   λ(k) 
ρ(0) ... ρ(k−1) ω0 (k) 1,1 
ρ(1)

(k)
ρ(−1) ... ρ(k−2) ω−1 (k) λ1,2  ρ(2)
 . .. .. 
 ...  =
 ..    .. , (7.39)
. . .
ρ(−k) ... ρ(−1) ω−k (k) (k)
ρ(k+1)
λ1,k+1
(i) (i)
where ωs (i) := ρ(s) − λ0,1 ρ(s + 1) − · · · − λ0,i ρ(s + i), s ∈ Z, i ∈ N.
Note that ωs (k) = 0 for s = −1, · · · − k by (7.38).
The recursion now proceeds with the best linear approximation of Yt
(k) (k)
based on Yt−1 , . . . , Yt−k , Rt−1,1 , Rt−2,0 with
(k) (k) (k) (k) (k)
Rt−1,1 := Yt−1 − λ1,1 Yt−2 − · · · − λ1,k Yt−k−1 − λ1,k+1 Rt−2,0
being the lagged residual of the previously executed regression. After
l > 0 such iterations, we get
(k+l) (k+l)
λ0,1 Yt−1 + · · · + λ0,k+l Yt−k−l
(k) (k)
= λl,1 Yt−1 + · · · + λl,k Yt−k
(k) (k) (k) (k) (k) (k)
+ λl,k+1 Rt−1,l−1 + λl,k+2 Rt−2,l−2 + · · · + λl,k+l Rt−l,0 ,
where
0
l
(k) (k) (k) (k) (k)
X
Rt0 ,l0 := Yt0 − λl0 ,1 Yt0 −1 − · · · − λl0 ,k Yt0 −k − λl0 ,k+i Rt0 −i,l0 −i
i=1
P0 (k) (k)
for t0 ∈ Z, l0 ∈ N0 , i=1 λl0 ,k+i Rt0 −i,l0 −i := 0 and where all occurring
(k)
coefficients are real valued. Since, on the other hand, Rt,l = Yt −
(k+l) (k+l)
λ0,1 Yt−1 − · · · − λ0,k+l Yt−k−l , the coefficients satisfy
1 0 ··· 0
 
(k+l−1) ... ..
 Ik −λ0,1 1 . 
 (k) 
λl,1

 −λ0,2
(k+l−1)
−λ0,1
(k+l−2) ... 
  λ(k) 
 (k+l) 
λ0,1
 .. ..   l,2  .. 
 . . 0
  λ(k)  
. 
  l,3  
1 = ,

 .. ..   ..   .  
 . . (k)
−λ0,1   .   ..  
   (k) 
.. .. ... ..   λl,k+l−1  (k+l)
. . .

 0lk  λ0,k+l
(k+l−1) (k+l−2) (k) (k)
 −λ0,k+l−2 −λ0,k+l−3 ··· −λ0,k−1  λl,k+l
(k+l−1) (k+l−2) (k)
−λ0,k+l−1 −λ0,k+l−2 ··· −λ0,k
where Ik denotes the k × k-identity matrix and 0lk denotes the l × k-

matrix consisting of zeros. Applying again (7.38) yields
 ρ(0) ... ρ(k−1) ω (k+l−1) ω (k+l−2) ... ω (k) 
0 1 l−1
ρ(−1) ... ρ(k−2) 0 ω0 (k+l−1) ... ωl−2 (k)
 (k) 
λl,1
 .. .. ..  
 . . 0 0 .   λ(k)
.. .. .. ..  λl,2
 
(k) 
 . . . . ω0 (k)  ·  l,3 

 .. ..   ... 
 . . 0 0 ... 0
 .. .. .. .. ..  (k)
λl,k+l
. . . . .
ρ(−k−l+1) ... ρ(−l) 0 0 ... 0
 ρ(1) 
ρ(2)
= ρ(3)
. (7.40)
 
..
.
ρ(k+l)
(k) (k)
If we assume that the coefficients λl,1 , . . . , λl,k+l are uniquely deter-
mined, then, for k = p and l ≥ q,



 a
s P if 1 ≤ s ≤ p .

 2 q Pq+p−s (p)
σ v=s−p bq αv−s+p − ωi (2p + l − s − i)λl,s+i

(p) i=1
λl,s :=


 ω0 (2p + l − s) if p + 1 ≤ s ≤ p + q

 0 if s > q + p
(7.41)
turns out to be the uniquely determined solution of the equation sys-
tem (7.40) by means of Theorem 2.2.8. Thus, we have the represen-
tation of a MA(q)-process
p p q
(p)
X X X
Yt − λl,s Yt−s = Yt − as Yt−s = bv εt−v ,
s=1 s=1 v=0
whose autocorrelation function will cut off after lag q.

Now, in practice, we initiate the iteration by fitting a pure AR(k)-
regression model
(k) (k) (k)
ŷt,0 :=λ̂0,1 ỹt−1 + · · · + λ̂0,k ỹt−k
to the zero-mean time series ỹ1 , . . . , ỹn for t = 1+k, . . . , n by means of

the least squares method. That is, we need to minimize the empirical
residual sum of squares
n−k
(k) (k) (k)
X
RSS(λ̂0,1 , . . . , λ̂0,k ) = (ỹt+k − ŷt+k,0 )2
t=1
n−k
(k) (k)
X
= (ỹt+k − λ̂0,1 ỹt+k−1 + · · · + λ̂0,k ỹt )2 .
t=1
Using partial derivatives and equating them to zero yields the follow-
ing equation system

P 
n−k
ỹ ỹ
t+k t+k−1
 t=1 ..
(7.42)

 . 
Pn−k
t=1 ỹt+k ỹt
P   (k) 
n−k Pn−k
ỹt+k−1 ỹt+k−1 . . . t=1 ỹt+k−1 ỹt λ̂0,1
 t=1 . . . . 
= .. .. ..   ..  ,
 
Pn−k Pn−k (k)
t=1 ỹ t ỹ t+k−1 . . . t=1 ỹt ỹt λ̂0,k ,
which can be approximated by

  (k) 
. . . c(k − 1) λ̂0,1
  
c(1) c(0)
.
 ..  =  .
.. . .. .
..   ... 


c(k) c(−k + 1) . . . c(0) (k)
λ̂0,k .
for n sufficiently large and k sufficiently small, where the terms c(i) =
1
Pn−i
n t=1 ỹt+i ỹt , i = 1, . . . , k, denote the empirical autocorrelation coef-
ficients at lag i. We define until the end of this section for a given
real valued series (ci )i∈N
h
X h
Y
ci := 0 if h < j as well as ci := 1 if h < j.
i=j i=j
Thus, proceeding the iteration, we recursively obtain after l > 0 steps

(k) (k) (k) (k) (k) (k) (k)
ŷt,l := λ̂l,1 ỹt−1 + · · · + λ̂l,k ỹt−k + λ̂l,k+1 rt−1,l−1 + · · · + λ̂l,k+l rt−l,0
for t = 1 + k + l, . . . , n, where
0
k l
(k) (k) (k) (k)
X X
rt0 ,l0 := ỹt0 − ỹt0 −i λ̂l0 ,i − λ̂l0 ,k+j rt0 −j,l0 −j (7.43)
i=1 j=1
for t0 ∈ Z, l0 ∈ N0 .
Furthermore, we receive the equation system

 c(0) ... c(k−1) ω̃ (k+l−1) ω̃ (k+l−2) ... ω̃
l−1 (k)

0 1
c(−1) ... c(k−2) 0 ω̃0 (k+l−1) ... ω̃l−2 (k)
 (k) 
λ̂l1
 .. .. .. 
 . . 0 0 .   λ̂(k)
  l2(k)
.. .. .. ..
 
 . . . . ω̃0 (k)  ·  λ̂l3 

 .. ..   .
 . . 0 0 ... 0
 .. 
 .. .. .. .. ..  (k)
λ̂lk+l
. . . . .
c(−k−l) ... c(−l−1) 0 0 ... 0
 c(1) 
c(2)
 c(3) 
= ,
 .. 
.
c(k+l)
(i) (i)
where ω̃s (i) := c(s)−λ0,1 c(s+1)−· · ·−λ0,i c(s+i), s ∈ Z, i ∈ N. Since
1
Pn−k P
n t=1 Yt−k Yt → γ(k) as n → ∞ by Theorem 7.2.13, we attain, for
(p) (p) (p) (p)
k = p, that (λ̂l,1 , . . . , λ̂l,p )T ≈ (λl,1 , . . . , λl,p )T = (a1 , . . . , ap , )T for n
sufficiently large, if, actually, the considered model (7.31) with p = k
and q ≤ l is underlying the zero-mean time series ỹ1 , . . . , ỹn . Thereby,
(p) (p) (p) (p)
Zt,l :=Yt − λ̂l,1 Yt−1 − λ̂l,2 Yt−2 − · · · − λ̂l,p Yt−p
≈Yt − a1 Yt−1 − a2 Yt−2 − · · · − ap Yt−p
=εt + b1 εt−1 + · · · + bq εt−q
for t ∈ Z behaves like a MA(q)-process, if n is large enough. Defining

(p) (p) (p) (p)
zt,l := ỹt − λ̂l,1 ỹt−1 − λ̂l,2 ỹt−2 − · · · − λ̂l,p ỹt−p , t = p + 1, . . . , n,
(p) (p)
we attain a realization zp+1,l , . . . , zn,l of the process. Now, the empir-
ical autocorrelation coefficient at lag l + 1 can be computed, which
will be close to zero, if the zero-mean time series ỹ1 , . . . , ỹn actually
results from an invertible ARMA(p, q)-process satisfying the station-
arity condition with expectation E(Yt ) = 0, variance Var(Yt ) > 0 and
q ≤ l.
(k)
Now, if k > p, l ≥ q, then, in general, λ̂lk 6= 0. Hence, we can’t
(k)
assume that Zt,l will behave like a MA(q)-process. We can handle
this by considering the following property.
An ARMA(p, q)-process Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · +

bq εt−q can be more conveniently rewritten as
α(B)[Yt ] = β(B)[εt ], (7.44)
where
α(B) := 1 − a1 B − · · · − ap B p ,
β(B) := 1 + b1 B + · · · + bq B q
are the characteristic polynomials in B of the autoregressive and mov-

ing average part of the ARMA(p, q)-process, respectively, and where
B denotes the backward shift operator defined by B s Yt = Yt−s and
B s εt = εt−s , s ∈ Z. If the process satisfies the stationarity condition,
then, by Theorem 2.1.11, there exists an absolutely summable causal
inverse filter (du )u≥0 of the filter c0 = 1, cu =Pau , u = 1, . . . , p, cu = 0
elsewhere. Hence, we may define α(B)−1 := u≥0 du B u and attain
Yt = α(B)−1 β(B)[εt ]
as the almost surely stationary solution of (Yt )t∈Z . Note that B i B j =

B i+j for i, j ∈ Z. Theorem 2.1.7 provides the covariance generating
function

2 −1 −1 −1 −1
G(B) = σ α(B) β(B) α(B ) β(B )

2 −1 −1
= σ α(B) β(B) α(F ) β(F ) ,
where F = B −1 denotes the forward shift operator . Consequently, if

we multiply both sides in (7.44) by an arbitrary factor (1−λB), λ ∈ R,
we receive the identical covariance generating function G(B). Hence,
we find infinitely many ARMA(p + l, q + l)-processes, l ∈ N, having
the same autocovariance function as the ARMA(p, q)-process.
(k)
Thus, Zt,l behaves approximately like a MA(q+k−p)-process and the
empirical autocorrelation coefficient at lag l + 1 + k − p, k > p, l ≥ q,
(k) (k)
of the sample zk+1,l , . . . , zn,l is expected to be close to zero, if the zero-
mean time series ỹ1 , . . . , ỹn results from the considered model (7.31).
Altogether, we expect, for an invertible ARMA(p, q)-process satisfy-
ing the stationarity condition with expectation E(Yt ) = 0 and variance
Var(Yt ) > 0, the following theoretical ESACF-table, where the men-

tioned autocorrelation coefficients are summarized with respect to k
and l
k\l . . . q-1 q q+1 q+2 q+3 . . .

.. .. .. .. .. ..
. . . . . .
p-1 ... X X X X X ...
p ... X 0 0 0 0 ...
p+1 ... X X 0 0 0 ...
p+2 ... X X X 0 0 ...
.. .. .. .. . .. . .. ...
. . . .
Table 7.5.1: Theoretical ESACF-table for an ARMA(p, q)-process,

where the entries ’X’ represent arbitrary numbers.
Furthermore, we can formulate a test H0 : ρk (l + 1) = 0 against

H1 : ρk (l + 1) 6= 0 based on the approximation of Bartlett’s formula
(7.30) for MA(l)-processes, where ρk (l + 1) denotes the autocorrela-
(k)
tion function of Zt,l at lag l + 1. The hypothesis H0 : ρk (l + 1) = 0
that the autocorrelation function at lag l + 1 is equal to zero is re-
jected at level α and so is the presumed ARMA(k, l)-model, if its
empirical counterpart rk (l + 1) lies outside of the confidence inter-
val I := [−σl q1−α/2 , σl q1−α/2 ], where q1−α/2 denotes the (1 − α/2)-
quantile of the standard normal distribution. Equivalent, the hypoth-
esis H0 : ρk (l+1) = 0 is rejected, if the pertaining approximate p-value
2(1 − Φ0,σl2 (|r̂k (l + 1)|)) is significantly small, where Φ0,σl2 denotes the
normal distribution with expectation 0 and variance σl2 and where
r̂k (l + 1) denotes the autocorrelation estimator at lag l + 1. The cor-
responding ESACF-table and p-values of the present case study are
summarized in the concluding analysis of the order selection methods
at the end of this section, cf. Listings 7.5.1e and 7.5.1f.
ESACF Algorithm
The computation of the relevant autoregressive coefficients in the
ESACF-approach follows a simple algorithm
(k+1)
(k) (k+1) (k) λ̂l−1,k+1
λ̂l,r = λ̂l−1,r − λ̂l−1,r−1 (k) (7.45)
λ̂l−1,k
(·)
for k, l ∈ N not too large and r ∈ {1, . . . , k}, where λ̂·,0 = −1. Observe
that the estimations in the previously discussed iteration become less
reliable, if k or l are too large, due to the finite sample size n. Note
that the formula remains valid for r < 1, if we define furthermore
(·)
λ̂·,i = 0 for negative i. Notice moreover that the algorithm as well as
(k ∗ )
the ESACF approach can only be executed, if λ̂l∗ ,k∗ turns out to be
unequal to zero for l∗ ∈ {0, . . . , l − 1} and k ∗ ∈ {k, . . . , k + l − l∗ − 1},
which, in general, is satisfied. Under latter condition above algorithm
can be shown inductively.
Proof. For k ∈ N not too large and t = k + 2, . . . , n,
(k+1) (k+1) (k+1)
ŷt,0 = λ̂0,1 ỹt−1 + · · · + λ̂0,k+1 ỹt−k−1
as well as
(k) (k) (k)
ŷt,1 = λ̂1,1 ỹt−1 + · · · + λ̂1,k ỹt−k +

(k) (k) (k)
+ λ̂1,k+1 ỹt−1 − λ̂0,1 ỹt−2 − · · · − λ̂0,k ỹt−k−1
is provided. Due to the equality of the coefficients, we find

(k+1) (k) (k)
λ̂0,k+1 = −λ̂1,k+1 λ̂0,k as well as (7.46)
(k+1)
(k+1) (k) (k) (k) (k) (k) λ̂0,k+1
λ̂0,i = λ̂1,i − λ̂1,k+1 λ̂0,i−1 = λ̂1,i + λ̂0,i−1 (k)
λ̂0,k
for i = 1, . . . , k. So the assertion (7.45) is proved for l = 1, k not too

large and r ∈ {1, . . . , k}.
(k) (k+1) (k) (k+1) (k)
Now, let λ̂l∗ ,r = λ̂l∗ −1,r − λ̂l∗ −1,r−1 (λ̂l∗ −1,k+1 )/(λ̂l∗ −1,k ) be shown for
every l∗ ∈ N with 1 ≤ l∗ < l, l, k ∈ N not too large, l 6= 1, and
r ∈ {1, . . . , k}. Applying successively (7.43), we show, for t = k + l0 +

1, . . . , n and inductively on l0 ∈ N0 , that
(k) (k) (k) (k) (k) (k) (k)
ŷt,l0 = λ̂l0 ,1 ỹt−1 + · · · + λ̂l0 ,k ỹt−k + λ̂l0 ,k+1 rt−1,l0 −1 + · · · + λ̂l0 ,k+l0 rt−l0 ,0
k k
(k) (k) (k)
X X
= λ̂l0 ,i ỹt−i − λ̂l0 ,k+1 ỹt−ι−1 λ̂l0 −1,ι
i=1 ι=0
0 0
l −1 l
(k) (k) (k) (k) (k)
X X
− λ̂l0 ,k+1 λ̂l0 −1,k+j rt−j−1,l0 −j−1 + λ̂l0 ,k+h rt−h,l0 −h
j=1 h=2
0 s1 +sX
l k h 2 +···=s
(k) (k)
X X
= ỹt−i−s λ̂l0 −s,i (−λ̂l0 ,k+s1 )·
s=0 i=max{0,1−s} sj =0 if sj−1 =0
i
∗(k) ∗(k)
(−λ̂l0 −(s−s2 ),k+s2 ) . . . (−λ̂l0 −(s−s2 −···−s0 ),k+s0 )
l l
0
l k
(k) (k)
X X
= ỹt−i−s λ̂l0 −s,i Φs,l0 (7.47)
s=0 i=max{0,1−s}
with
h s1 +sX
2 +···=s s i
(k) (k) ∗(k)
Y
Φs,l0 := (−λ̂l0 ,k+s1 ) − λ̂l0 −(s−s2 −···−sj ),k+sj ,
sj =0 if sj−1 =0 j=2
where
(
∗(k) 1 if sj = 0,
(−λ̂l0 −(s−s2 −···−sj ),k+sj ) = (k)
−λ̂l0 −(s−s2 −···−sj ),k+sj else.
In the case of l0 = 0, we receive

(k) (k) (k)
ŷt,0 = λ̂0,1 ỹt−1 + · · · + λ̂0,k ỹt−k
k
(k)
X
= ỹt−i λ̂0,i .
i=1
So, let the assertion (7.47) be shown for one l0 ∈ N0 . The induction
is then accomplished by considering

0
l
(k) (k) (k) (k) (k)∗
X
ŷt,l0 +1 = λ̂l0 +1,1 ỹt−1 + ··· + λ̂l0 +1,k ỹt−k + λ̂l0 +1,k+h rt−h,l0 +1−h
h=1
k
(k) (k)
X
+ ỹt−i−l0 −1 λ̂0,i Φl0 +1,l0 +1
i=0
0
l k k
(k) (k) (k) (k)
X X X
= ỹt−i−s λ̂l0 +1−s,i Φs,l0 +1 + ỹt−i−l0 −1 λ̂0,i Φl0 +1,l0 +1
s=0 i=max{0,1−s} i=0
0
l +1 k
(k) (k)
X X
= ỹt−i−s λ̂l0 +1−s,i Φs,l0 +1 .
s=0 i=max{0,1−s}
Next, we show, inductively on m, that

(k) (k) (k+1) (k+1) (k) (k+1)
λ̂m,k Φl−m,l = λ̂m,k+1 Φl−m−1,l−1 + λ̂m,k Φl−m,l−1 (7.48)
(k) (k+1)
for m = 1, . . . , l − 1. Since ŷt,l0 = ŷt,l0 −1 , we find, using (7.47), the
equal factors of ỹt−l0 −k , which satisfy in conjunction with (7.46)
(k) (k) (k+1) (k+1) (k) (k) (k+1)
λ̂0,k Φl0 ,l0 = λ̂0,k+1 Φl0 −1,l0 −1 ⇔ Φl0 ,l0 = −λ̂1,k+1 Φl0 −1,l0 −1 . (7.49)
Equating the factors of ỹt−k−l0 +1 yields

(k) (k) (k) (k) (k+1) (k+1) (k+1) (k+1)
λ̂0,k−1 Φl0 ,l0 + λ̂1,k Φl0 −1,l0 = λ̂0,k Φl0 −1,l0 −1 + λ̂1,k+1 Φl0 −2,l0 −1 . (7.50)
(k) (k)
Multiplying both sides of the first equation in (7.49) with λ̂0,k−1 /λ̂0,k ,
where l0 is set equal to 1, and utilizing afterwards formula (7.45) with
r = k and l = 1
(k+1)
(k) λ̂0,k+1 (k+1) (k)
λ̂0,k−1 (k) = λ̂0,k − λ̂1,k ,
λ̂0,k
we get

(k) (k) (k+1) (k) (k+1)
λ̂0,k−1 Φl0 ,l0 = λ̂0,k − λ̂1,k Φl0 −1,l0 −1 ,
which leads together with (7.50) to

(k) (k) (k+1) (k+1) (k) (k+1)
λ̂1,k Φl0 −1,l0 = λ̂1,k+1 Φl0 −2,l0 −1 + λ̂1,k Φl0 −1,l0 −1 .
So, the assertion (7.48) is shown for m = 1, since we may choose l0 = l.
(k) (k) (k+1) (k+1) (k) (k+1)
Let λ̂m0 ,k Φl−m0 ,l = λ̂m0 ,k+1 Φl−m0 −1,l−1 + λ̂m0 ,k Φl−m0 ,l−1 now be shown for
every m0 ∈ N with 1 ≤ m0 < m ≤ l − 1. We define
(k) (k+1) (k+1) (k) (k) (k) (k+1)
Ψl,m0 := − λ̂m0 ,k+1 Φl−1−m0 ,l−1 + λ̂m0 ,k Φl−m0 ,l − λ̂m0 ,k Φl−m0 ,l−1
as well as
(k) (k) (k) (k+1) (k+1)
Ψl,0 :=λ̂0,k Φl,l − λ̂0,k+1 Φl−1,l−1 ,
which are equal to zero by (7.48) and (7.49). Together with (7.45)
(k+1)
(k) λ̂i,k+1 (k+1) (k)
λ̂i,k−m+i (k) = λ̂i,k−m+i+1 − λ̂i+1,k−m+i+1 ,
λ̂i,k
which is true for every i satisfying 0 ≤ i < m, the required condition
can be derived
m−1 (k)
X λ̂i,k−m+i (k)
0= (k)
Ψl,i
i=0 λ̂i,k
l k−1
(k) (k)
X X
= λ̂l−s,i Φs,l
s=0 i=max{0,1−s},i+s=k+l−m
l−1 k
(k+1) (k+1)
X X
− λ̂l−1−v,j Φv,l−1
v=0 j=max{0,1−v},v+j=k+l−m
(k) (k+1)
+ λ̂m,k Φl−m,l−1 .
Noting the factors of ỹt−k−l+m
l k
(k) (k)
X X
λ̂l−s,i Φs,l
s=0 i=max{0,1−s},i+s=k+l−m
l−1 k+1
(k+1) (k+1)
X X
= λ̂l−1−v,j Φv,l−1 ,
v=0 j=max{0,1−v},v+j=k+l−m
we finally attain
(k) (k) (k+1) (k+1) (k) (k+1)
λ̂m,k Φl−m,l = λ̂m,k+1 Φl−m−1,l−1 + λ̂m,k Φl−m,l−1 ,
which completes the inductive proof of formula (7.48). Note that

(7.48) becomes in the case of m = l − 1
(k) (k) (k+1) (k) (k+1)
− λ̂l−1,k λ̂l,k+1 = λ̂l−1,k+1 − λ̂l−1,k λ̂l−1,k+2 . (7.51)
Applied to the equal factors of ỹt−1

(k) (k) (k+1) (k+1)
λ̂l,1 + λ̂l,k+1 = λ̂l−1,1 + λ̂l−1,k+2
yields assertion (7.45) for l, k not too large and r = 1,

(k+1)
(k) (k+1) λ̂l−1,k+1
λ̂l,1 = λ̂l−1,1 + (k)
.
λ̂l−1,k
For the other cases of r ∈ {2, . . . , k}, we consider

l−2 (k)
X λ̂i,r−l+i (k)
0= (k)
Ψl,i
i=0 λ̂i,k
l k
(k) (k)
X X
= λ̂l−s,i Φs,l
s=2 i=max{0,1−s},s+i=r
l−1 k+1
(k+1) (k+1)
X X
− λ̂l−1−v,j Φv,l−1
v=1 j=max{0,1−v},v+j=r
(k) (k+1)
− λ̂l−1,r−1 λ̂l−1,k+2 .
Applied to the equation of the equal factors of ỹt−r leads by (7.51) to

l k
(k) (k)
X X
λ̂l−s,i Φs,l
l−1 k+1
(k+1) (k+1)
X X
= λ̂l−1−v,j Φv,l−1
⇔
l k
(k) (k) (k) (k) (k)
X X
λ̂l,r − λ̂l−1,r−1 λ̂l,k+1 + λ̂l−s,i Φs,l
l−1 k+1
(k+1) (k+1) (k+1)
X X
= λ̂l−1,r + λ̂l−1−v,j Φv,l−1
⇔
(k) (k+1) (k) (k) (k) (k+1)
λ̂l,r = λ̂l−1,r + λ̂l−1,r−1 λ̂l,k+1 − λ̂l−1,r−1 λ̂l−1,k+2
⇔
(k+1)
(k) (k+1) (k) λ̂l−1,k+1
λ̂l,r = λ̂l−1,r − λ̂l−1,r−1 (k)
λ̂l−1,k
Remark 7.5.2. The autoregressive coefficients of the subsequent iter-

ations are uniquely determined by (7.45), once the initial approxima-
(k) (k)
tion coefficients λ̂0,1 , . . . , λ̂0,k have been provided for all k not too large.
Since we may carry over the algorithm to the theoretical case, the so-
(k ∗ )
lution (7.41) is in deed uniquely determined, if we provide λl∗ ,k∗ 6= 0
in the iteration for l∗ ∈ {0, . . . , l − 1} and k ∗ ∈ {p, . . . , p + l − l∗ − 1}.
SCAN Method
The smallest canonical correlation method, short SCAN-method , sug-
gested by Tsay and Tiao (1985), seeks nonzero vectors a and b ∈ Rk+1
such that the correlation
T T aT Σyt,k ys,k b
Corr(a yt,k , b ys,k ) = T (7.52)
(a Σyt,k yt,k a)1/2 (bT Σys,k ys,k b)1/2
is minimized, where yt,k := (Yt , . . . , Yt−k )T and ys,k := (Ys , . . . , Ys−k )T ,

s, t ∈ Z, t > s, are random vectors of k + 1 process variables, and
1/2 1/2
ΣV W := Cov(V, W ). Setting αt,k := Σyt,k yt,k a and βs,k := Σys,k ys,k b,
we attain

T −1/2 −1/2
αt,k Σyt,k yt,k Σyt,k ys,k Σys,k ys,k βs,k

Corr(aT yt,k , bT ys,k ) =

(αTt,k αt,k )1/2 (βs,k
T β )1/2
s,k
−1/2 −1/2
(αTt,k Σyt,k yt,k Σyt,k ys,k Σ−1
ys,k ys,k Σys,k yt,k Σyt,k yt,k αt,k )
1/2
≤
(αTt,k αt,k )1/2
=: Rs,t,k (αt,k ) (7.53)
by the Cauchy-Schwarz inequality. Since Rs,t,k (αt,k )2 is a Rayleigh

quotient, it maps the k + 1 eigenvectors of the matrix
Mt−s,k := Σ−1/2 −1 −1/2

yt,k yt,k Σyt,k ys,k Σys,k ys,k Σys,k yt,k Σyt,k yt,k , (7.54)
which only depends on the lag t − s and k, on the corresponding

eigenvalues, which are called squared canonical correlations (see for
example Horn and Johnson, 1985). Hence, there exist nonzero vectors
a and b ∈ Rk+1 such that the squared value of above correlation (7.52)
is lower or equal than the smallest eigenvalue of Mt−s,k . As Mt−s,k
and
Σ−1 −1
yt,k yt,k Σyt,k ys,k Σys,k ys,k Σys,k yt,k , (7.55)
have the same k + 1 eigenvalues λ1,t−s,k ≤ · · · ≤ λk+1,t−s,k with corre-

1/2
sponding eigenvectors Σyt,k yt,k µi and µi , i = 1, . . . , k + 1, respectively,
we henceforth focus on matrix (7.55).
Now, given an invertible ARMA(p, q)-process satisfying the station-
arity condition with expectation E(Yt ) = 0 and variance Var(Yt ) > 0.
For k+1 > p and l+1 := t−s > q, we find, by Theorem 2.2.8, the nor-
malized eigenvector µ1 := η · (1, −a1 , . . . , −ap , 0, . . . , 0)T ∈ Rk+1 , η ∈
R, of Σ−1 −1
yt,k yt,k Σyt,k ys,k Σys,k ys,k Σys,k yt,k with corresponding eigenvalue zero,
since Σys,k yt,k µ1 = 0 · µ1 . Accordingly, the two linear combinations
Yt −a1 Yt−1 −· · ·−ap Yt−p and Ys −a1 Ys−1 −· · ·−ap Ys−p are uncorrelated
for t − s > q by (7.53), which isn’t surprising, since they are repre-
sentations of two MA(q)-processes, which are in deed uncorrelated for
t − s > q.
In practice, given a zero-mean time series ỹ1 , . . . , ỹn with sufficiently
large n, the SCAN method computes the smallest eigenvalues λ̂1,t−s,k
of the empirical counterpart of (7.55)
Σ̂−1 −1
yt,k yt,k Σ̂yt,k ys,k Σ̂ys,k ys,k Σ̂ys,k yt,k ,
where the autocovariance coefficients γ(m) are replaced by their em-

P
pirical counterparts c(m). Since ĉ(m) → γ(m) as n → ∞ by Theo-
rem 7.2.13, we expect that λ̂1,t−s,k ≈ λ1,t−s,k , if t − s and k are not
too large. Thus, λ̂1,t−s,k ≈ 0 for k + 1 > p and l + 1 = t − s > q,
if the zero-mean time series is actually a realization of an invertible
ARMA(p, q)-process satisfying the stationarity condition with expec-
tation E(Yt ) = 0 and variance Var(Yt ) > 0.
Applying Bartlett’s formula (7.28) form page 272 and the asymptotic
distribution of canonical correlations (e.g. Anderson, 1984, Section
13.5), the following asymptotic test statistic can be derived for testing
if the smallest eigenvalue equals zero,
c(k, l) = −(n − l − k)ln(1 − λ̂∗1,l,k /d(k, l)),
where d(k, l)/(n − l − k) is an approximation of the variance of the

root of an appropriate estimator λ̂∗1,l,k of λ1,l,k . Note that λ̂∗1,l,k ≈ 0
entails c(k, l) ≈ 0. The test statistics c(k, l) are computed with respect
to k and l in a chosen order range of pmin ≤ k ≤ pmax and qmin ≤
l ≤ qmax . In the theoretical case, if actually an invertible ARMA(p, q)-
model satisfying the stationarity condition with expectation E(Yt ) = 0
and variance Var(Yt ) > 0 underlies the sample, we would attain the
following SCAN-table, consisting of the corresponding test statistics
c(k, l)
Under the hypothesis H0 that the ARMA(p, q)-model (7.31) is ac-
tually underlying the time series ỹ1 , . . . , ỹn , it can be shown that
c(k, l) asymptotically follows a χ21 -distribution, if p ≤ k ≤ pmax and
k\l . . . q-1 q q+1 q+2 q+3 . . .

.. .. .. .. .. ..
. . . . . .
p-1 ... X X X X X ...
p ... X 0 0 0 0 ...
p+1 ... X 0 0 0 0 ...
p+2 ... X 0 0 0 0 ...
.. .. .. .. .. ..
. . . . . .
Table 7.5.2: Theoretical SCAN-table for an ARMA(p, q)-process,

where the entries ’X’ represent arbitrary numbers.
q ≤ l ≤ qmax . We would therefore reject the ARMA(p, q)-model

(7.31), if one of the test statistics c(k, l) is unexpectedly large or, re-
spectively, if one of the corresponding p-values 1 − χ21 (c(k, l)) is too
small for k ∈ {p, . . . , pmax } and l ∈ {q, . . . , qmax }.
Note that the SCAN method is also valid in the case of nonstationary
ARMA(p, q)-processes (Tsay and Tiao, 1985).
Minimum Information Criterion
Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5 MA 6 MA 7 MA 8
AR 0 9.275 9.042 8.830 8.674 8.555 8.461 8.386 8.320 8.256

AR 1 7.385 7.257 7.257 7.246 7.240 7.238 7.237 7.237 7.230
AR 2 7.270 7.257 7.241 7.234 7.235 7.236 7.237 7.238 7.239
AR 3 7.238 7.230 7.234 7.235 7.236 7.237 7.238 7.239 7.240
AR 4 7.239 7.234 7.235 7.236 7.237 7.238 7.239 7.240 7.241
AR 5 7.238 7.235 7.237 7.230 7.238 7.239 7.240 7.241 7.242
AR 6 7.237 7.236 7.237 7.238 7.239 7.240 7.241 7.242 7.243
AR 7 7.237 7.237 7.238 7.239 7.240 7.241 7.242 7.243 7.244
AR 8 7.238 7.238 7.239 7.240 7.241 7.242 7.243 7.244 7.246
Error series model: AR(15)

Minimum Table Value: BIC(2,3) = 7.234557
Listing 7.5.1: MINIC statistics (BIC values).
Squared Canonical Correlation Estimates
Lags MA 1 MA 2 MA 3 MA 4 MA 5 MA 6 MA 7 MA 8
AR 1 0.0017 0.0110 0.0057 0.0037 0.0022 0.0003 <.0001 <.0001

AR 2 0.0110 0.0052 <.0001 <.0001 0.0003 <.0001 <.0001 <.0001
AR 3 0.0011 0.0006 <.0001 <.0001 0.0002 <.0001 <.0001 0.0001
AR 4 <.0001 0.0002 0.0003 0.0002 <.0001 <.0001 <.0001 <.0001
AR 5 0.0002 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
AR 6 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
AR 7 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
AR 8 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001
Listing 7.5.1b: SCAN table.
SCAN Chi-Square[1] Probability Values
AR 1 0.0019 <.0001 <.0001 <.0001 0.0004 0.1913 0.9996 0.9196

AR 2 <.0001 <.0001 0.5599 0.9704 0.1981 0.5792 0.9209 0.9869
AR 3 0.0068 0.0696 0.9662 0.5624 0.3517 0.6279 0.5379 0.5200
AR 4 0.9829 0.3445 0.1976 0.3302 0.5856 0.7427 0.5776 0.8706
AR 5 0.3654 0.6321 0.7352 0.9621 0.7129 0.5468 0.6914 0.9008
AR 6 0.6658 0.6065 0.8951 0.8164 0.6402 0.6789 0.6236 0.9755
AR 7 0.6065 0.6318 0.7999 0.9358 0.7356 0.7688 0.9253 0.7836
AR 8 0.9712 0.9404 0.6906 0.7235 0.7644 0.8052 0.8068 0.9596
Listing 7.5.1c: P-Values of SCAN-approach.

Extended Sample Autocorrelation Function
AR 1 -0.0471 -0.1214 -0.0846 -0.0662 -0.0502 -0.0184 -0.0000 -0.0014

AR 2 -0.1627 -0.0826 0.0111 -0.0007 -0.0275 -0.0184 -0.0002 -0.0002
AR 3 -0.1219 -0.1148 0.0054 -0.0005 -0.0252 -0.0200 0.0020 -0.0005
AR 4 -0.0016 -0.1120 -0.0035 -0.0377 -0.0194 0.0086 -0.0119 -0.0029
AR 5 0.0152 -0.0607 -0.0494 0.0062 -0.0567 -0.0098 -0.0087 0.0034
AR 6 0.1803 -0.1915 -0.0708 -0.0067 -0.0260 0.0264 -0.0081 0.0013
AR 7 -0.4009 -0.2661 -0.1695 0.0091 -0.0240 0.0326 -0.0066 0.0008
AR 8 0.0266 0.0471 -0.2037 -0.0243 -0.0643 0.0226 0.0140 -0.0008
Listing 7.5.1d: ESACF table.
ESACF Probability Values
AR 1 0.0003 <.0001 <.0001 <.0001 0.0002 0.1677 0.9992 0.9151

AR 2 <.0001 <.0001 0.3874 0.9565 0.0261 0.1675 0.9856 0.9845
AR 3 <.0001 <.0001 0.6730 0.9716 0.0496 0.1593 0.8740 0.9679
AR 4 0.9103 <.0001 0.7851 0.0058 0.1695 0.6034 0.3700 0.8187
AR 5 0.2879 0.0004 0.0020 0.6925 <.0001 0.4564 0.5431 0.8464

AR 6 <.0001 <.0001 <.0001 0.6504 0.0497 0.0917 0.5944 0.9342
AR 7 <.0001 <.0001 <.0001 0.5307 0.0961 0.0231 0.6584 0.9595
AR 8 0.0681 0.0008 <.0001 0.0802 <.0001 0.0893 0.2590 0.9524
Listing 7.5.1e: P-Values of ESACF-approach.
ARMA(p+d,q) Tentative Order Selection Tests
---------SCAN-------- --------ESACF--------
p+d q BIC p+d q BIC
3 2 7.234889 4 5 7.238272
2 3 7.234557 1 6 7.237108
4 1 7.234974 2 6 7.237619
1 6 7.237108 3 6 7.238428
6 6 7.241525
8 6 7.243719
(5% Significance Level)

Listing 7.5.1f: Order suggestions of SAS, based on the methods SCAN
and ESACF.
Conditional Least Squares Estimation
Standard Approx
Parameter Estimate Error t Value Pr > |t| Lag
MU -3.77827 7.19241 -0.53 0.5994 0

MA1,1 0.34553 0.03635 9.51 <.0001 1
MA1,2 0.34821 0.01508 23.10 <.0001 2
MA1,3 0.10327 0.01501 6.88 <.0001 3
AR1,1 1.61943 0.03526 45.93 <.0001 1
AR1,2 -0.63143 0.03272 -19.30 <.0001 2
Listing 7.5.1g: Parameter estimates of ARMA(2, 3)-model.
Standard Approx
MU -3.58738 7.01935 -0.51 0.6093 0

MA1,1 0.58116 0.03196 18.19 <.0001 1
MA1,2 0.23329 0.03051 7.65 <.0001 2
AR1,1 1.85383 0.03216 57.64 <.0001 1
AR1,2 -1.04408 0.05319 -19.63 <.0001 2
AR1,3 0.17899 0.02792 6.41 <.0001 3
Listing 7.5.1h: Parameter estimates of ARMA(3, 2)-model.
1 /* donauwoerth_orderselection.sas */
2 TITLE1 ’Order Selection’;
4
5 /* Note that this program requires the file ’seasad’ generated by
,→program donauwoerth_adjustment.sas */
6
7 /* Order selection by means of MINIC, SCAN and ESACF approach */
9 IDENTIFY VAR=sa MINIC PERROR=(8:20) SCAN ESACF p=(1:8) q=(1:8);
10 RUN;
11
12 /* Estimate model coefficients for chosen orders p and q */
14 IDENTIFY VAR=sa NLAG=100 NOPRINT;
15 ESTIMATE METHOD=CLS p=2 q=3;
16 RUN;
17
21 RUN; QUIT;
The discussed order identification methods are of the autoregressive model is specified that
carried out in the framework of PROC ARIMA, is used for estimating the error sequence in
invoked by the options MINIC, SCAN and the MINIC approach. The second part of the
ESACF in the IDENTIFY statement. p = (1 : 8) program estimates the coefficients of the pre-
and q = (1 : 8) restrict the order ranges, ferred ARMA(2, 3)- and ARMA(3, 2)-models,
admitting integer numbers in the range of 1 using conditional least squares, specified by
to 8 for the autoregressive and moving aver- METHOD=CLS in the ESTIMATE statement.
age order, respectively. By PERROR, the order
p q BIC
2 3 7.234557
3 2 7.234889
4 1 7.234974
3 3 7.235731
2 4 7.235736
5 1 7.235882
4 2 7.235901
Table 7.5.3: Smallest BIC-values with respect to orders p and q.

Figure 7.5.2: Graphical illustration of BIC-values.
SAS advocates an ARMA(2, 3)-model as appropriate model choose,

referring to the minimal BIC-value 7.234557 in Listing 7.5.1. The
seven smallest BIC-values are summarized in Table 7.5.3 and a graph-
ical illustration of the BIC-values is given in Figure 7.5.2.
The next smallest BIC-value 7.234889 pertains to the adjustment of
an ARMA(3, 2)-model. The SCAN method also prefers these two
models, whereas the ESACF technique suggests models with a nu-
merous quantity of parameters and great BIC-values. Hence, in the
following section, we will investigate the goodness of fit of both the
ARMA(2, 3)- and the ARMA(3, 2)-model.
The coefficients of the models are estimated by the least squares
method, as introduced in Section 2.3, conditional on the past esti-
mated residuals ε̂t and unobserved values ỹt being equal to 0 for t ≤ 0.
Accordingly, this method is also called the conditional least squares
method . SAS provides
Yt − 1.85383Yt−1 + 1.04408Yt−2 − 0.17899Yt−3

= εt − 0.58116εt−1 − 0.23329εt−2 , (7.56)
7.6 Diagnostic Check 311
for the ARMA(3, 2)-model and
Yt − 1.61943Yt−1 + 0.63143Yt−2
= εt − 0.34553εt−1 − 0.34821εt−2 − 0.10327εt−3 (7.57)
for the ARMA(2, 3)-model. The negative means of -3.77827 and -

3.58738 are somewhat irritating in view of the used data with an
approximate zero mean. Yet, this fact presents no particular difficulty,
since it almost doesn’t affect the estimation of the model coefficients.
Both models satisfy the stationarity and invertibility condition, since
the roots 1.04, 1.79, 3.01 and 1.04, 1.53 pertaining to the characteristic
polynomials of the autoregressive part as well as the roots -3.66, 1.17
and 1.14, −2.26+1.84i, −2.26−1.84i pertaining to those of the moving
average part, respectively, lie outside of the unit circle. Note that
SAS estimates the coefficients in the ARMA(p, q)-model Yt − a1 Yt−1 −
a2 Yt−2 − · · · − ap Yt−p = εt − b1 εt−1 − · · · − bq εt−q .
7.6 Diagnostic Check

In the previous section we have adjusted an ARMA(3, 2)- as well
as an ARMA(2, 3)-model to the prepared Donauwoerth time series
ỹ1 , . . . , ỹn . The next step is to check, if the time series can be inter-
preted as a typical realization of these models. This diagnostic check
can be divided in several classes:
1. Comparing theoretical structures with the corresponding empirical

counterparts
2. Test of the residuals for white noise
3. Overfitting
Comparing Theoretical and Empirical Structures

An obvious method of the first class is the comparison of the empirical
partial and empirical general autocorrelation function, α̂(k) and r(k),
with the theoretical partial and theoretical general autocorrelation
function, αp,q (k) and ρp,q (k), of the assumed ARMA(p, q)-model. The
following plots show these comparisons for the identified ARMA(3, 2)-
and ARMA(2, 3)-model.
Plot 7.6.1: Theoretical autocorrelation function of the ARMA(3, 2)-

model with the empirical autocorrelation function of the adjusted time
series.
Plot 7.6.1b: Theoretical partial autocorrelation function of the

ARMA(3, 2)-model with the empirical partial autocorrelation func-
tion of the adjusted time series.
Plot 7.6.1c: Theoretical autocorrelation function of the ARMA(2, 3)-

model with the empirical autocorrelation function of the adjusted time
series.
Plot 7.6.1d: Theoretical partial autocorrelation function of the

ARMA(2, 3)-model with the empirical partial autocorrelation func-
tion of the adjusted time series.
1 /* donauwoerth_dcheck1.sas */
2 TITLE1 ’Theoretical process analysis’;
4 /* Note that this program requires the file ’corrseasad’ and ’seasad’
,→generated by program donauwoerth_adjustment.sas */
5
6 /* Computation of theoretical autocorrelation */

7 PROC IML;
8 PHI={1 -1.61943 0.63143};
9 THETA={1 -0.34553 -0.34821 -0.10327};
10 LAG=1000;
11 CALL ARMACOV(COV, CROSS, CONVOL, PHI, THETA, LAG);
12 N=1:1000;
13 theocorr=COV/7.7685992485;
14 CREATE autocorr23;
15 APPEND;
16 QUIT;
17
18 PROC IML;
19 PHI={1 -1.85383 1.04408 -0.17899};
20 THETA={1 -0.58116 -0.23329};
21 LAG=1000;
22 CALL ARMACOV(COV, CROSS, CONVOL, PHI, THETA, LAG);
23 N=1:1000;
24 theocorr=COV/7.7608247812;
25 CREATE autocorr32;
26 APPEND;
27 QUIT;
28
29 %MACRO Simproc(p,q);
30
32 SYMBOL1 V=NONE C=BLUE I=JOIN W=2;
33 SYMBOL2 V=DOT C=RED I=NONE H=0.3 W=1;
34 SYMBOL3 V=DOT C=RED I=JOIN L=2 H=1 W=2;
35 AXIS1 LABEL=(ANGLE=90 ’Autocorrelations’);
36 AXIS2 LABEL=(’Lag’) ORDER = (0 TO 500 BY 100);
37 AXIS3 LABEL=(ANGLE=90 ’Partial Autocorrelations’);
38 AXIS4 LABEL=(’Lag’) ORDER = (0 TO 50 BY 10);
39 LEGEND1 LABEL=NONE VALUE=(’theoretical’ ’empirical’);
40
41 /* Comparison of theoretical and empirical autocorrelation */
42 DATA compare&p&q;
43 MERGE autocorr&p&q(KEEP=N theocorr) corrseasad(KEEP=corr);
44 PROC GPLOT DATA=compare&p&q(OBS=500);
45 PLOT theocorr*N=1 corr*N=2 / OVERLAY LEGEND=LEGEND1 VAXIS=AXIS1
,→HAXIS=AXIS2;
46 RUN;
47
48 /* Computing of theoretical partial autocorrelation */
49 PROC TRANSPOSE DATA=autocorr&p&q OUT=transposed&p&q PREFIX=CORR;
50 DATA pacf&p&q(KEEP=PCORR0-PCORR100);
51 PCORR0=1;
52 SET transposed&p&q(WHERE=(_NAME_=’THEOCORR’));
53 ARRAY CORRS(100) CORR2-CORR101;
54 ARRAY PCORRS(100) PCORR1-PCORR100;
55 ARRAY P(100) P1-P100;
56 ARRAY w(100) w1-w100;
57 ARRAY A(100) A1-A100;
58 ARRAY B(100) B1-B100;
59 ARRAY u(100) u1-u100;
60 PCORRS(1)=CORRS(1);
61 P(1)=1-(CORRS(1)**2);
62 DO i=1 TO 100; B(i)=0; u(i)=0; END;
63 DO j=2 TO 100;
64 IF j > 2 THEN DO n=1 TO j-2; A(n)=B(n); END;
65 A(j-1)=PCORRS(j-1);
66 DO k=1 TO j-1; u(j)=u(j)-A(k)*CORRS(j-k); END;
67 w(j)=u(j)+CORRS(j);
68 PCORRS(j)=w(j)/P(j-1);
69 P(j)=P(j-1)*(1-PCORRS(j)**2);
70 DO m=1 TO j-1; B(m)=A(m)-PCORRS(j)*A(j-m); END;
71 END;
72 PROC TRANSPOSE DATA=pacf&p&q OUT=plotdata&p&q(KEEP=PACF1) PREFIX=PACF;
73
74 /* Comparison of theoretical and empirical partial autocorrelation */
75 DATA compareplot&p&q;
76 MERGE plotdata&p&q corrseasad(KEEP=partcorr LAG);
77 PROC GPLOT DATA=compareplot&p&q(OBS=50);

78 PLOT PACF1*LAG=1 partcorr*LAG=3 / OVERLAY VAXIS=AXIS3 HAXIS=AXIS4
,→LEGEND=LEGEND1;
79 RUN;
80
81 %MEND;
82 %Simproc(2,3);
83 %Simproc(3,2);
84 QUIT;
The procedure PROC IML generates an ecuted for the chosen pairs of process orders
ARMA(p, q)-process, where the p autoregres- (2, 3) and (3, 2).
sive and q moving-average coefficients are
specified by the entries in brackets after the The macro step: First, the theoretical autocor-
PHI and THETA option, respectively. Note relation and the empirical autocorrelation func-
that the coefficients are taken from the corre- tion computed in Program 7.4.2, are summa-
sponding characteristic polynomials with neg- rized in one file by the option MERGE in order
ative signs. The routine ARMACOV simulates to derive a mutual, visual comparison. Based
the theoretical autocovariance function up to on the theoretical autocorrelation function, the
LAG 1000. Afterwards, the theoretical autocor- theoretical partial autocorrelation is computed
relation coefficients are attained by dividing the by means of the Levinson–Durbin recursion, in-
theoretical autocovariance coefficients by the troduced in Section 7.1. For this reason, the
variance. The results are written into the files autocorrelation data are transposed by the pro-
autocorr23 and autocorr32, respectively. cedure PROC TRANSPOSE, such that the recur-
The next part of the program is a macro- sion can be carried out by means of arrays,
step. The macro is named Simproc and as executed in the sequent DATA step. The
has two parameters p and q. Macros enable PREFIX option specifies the prefix of the trans-
the user to repeat a specific sequent of steps posed and consecutively numbered variables.
and procedures with respect to some depen- Transposing back, the theoretical partial auto-
dent parameters, in our case the process or- correlation coefficients are merged with the co-
ders p and q. A macro step is initiated by efficients of the empirical partial autocorrela-
%MACRO and terminated by %MEND. Finally, by tions. PROC GPLOT together with the OVERLAY
the concluding statements %Simproc(2,3) option produces a single plot of both autocorre-
and %Simproc(3,2), the macro step is ex- lation functions.
In all cases there exists a great agreement between the theoretical and
empirical parts. Hence, there is no reason to doubt the validity of any
of the underlying models.
Examination of Residuals
We suppose that the adjusted time series ỹ1 , . . . ỹn was generated by
an invertible ARMA(p, q)-model
Yt = a1 Yt−1 + · · · + ap Yt−p + εt + b1 εt−1 + · · · + bq εt−q (7.58)

satisfying the stationarity condition, E(Yt ) = 0 and Var(Yt ) > 0,

entailing that E(εt ) = 0. As seen in recursion (2.23) on page 105 the
residuals can be estimated recursively by
ε̂t = ỹt − ŷt = ỹt − a1 ỹt−1 − · · · − ap ỹt−p − b1 ε̂t−1 − · · · − bq ε̂t−q ≈ εt
for t = 1, . . . n, where ε̂t and ỹt are set equal to zero for t ≤ 0.
Under the model assumptions they show an approximate white noise
behavior and therefore, their empirical autocorrelation function r̂ε̂ (k)
(2.24) will be close to zero for k > 0 not too large. Note that above
estimation of the empirical autocorrelation coefficients becomes less
reliable as k → n.
The following two figures show the empirical autocorrelation functions
of the estimated residuals up to lag 100 based on the ARMA(3, 2)-
model (7.56) and ARMA(2, 3)-model (7.57), respectively. Their pat-
tern indicate uncorrelated residuals in both cases, confirming the ad-
equacy of the chosen models.
Plot 7.6.2: Empirical autocorrelation function of estimated residuals

based on ARMA(3, 2)-model.
Plot 7.6.2b: Empirical autocorrelation function of estimated residuals

based on ARMA(2, 3)-model.
1 /* donauwoerth_dcheck2.sas*/
2 TITLE1 ’Residual Analysis’;
4
5 /* Note that this program requires ’seasad’ and ’donau’ generated by
,→the program donauwoerth_adjustment.sas */
6
7 /* Preparations */
8 DATA donau2;
9 MERGE donau seasad(KEEP=SA);
10
11 /* Test for white noise */
12 PROC ARIMA DATA=donau2;
14 ESTIMATE METHOD=CLS p=3 q=2 NOPRINT;
15 FORECAST LEAD=0 OUT=forecast1 NOPRINT;
16 ESTIMATE METHOD=CLS p=2 q=3 NOPRINT;
17 FORECAST LEAD=0 OUT=forecast2 NOPRINT;
18
19 /* Compute empirical autocorrelations of estimated residuals */
20 PROC ARIMA DATA=forecast1;
21 IDENTIFY VAR=residual NLAG=150 OUTCOV=cov1 NOPRINT;
22 RUN;
23
24 PROC ARIMA DATA=forecast2;
25 IDENTIFY VAR=residual NLAG=150 OUTCOV=cov2 NOPRINT;
26 RUN;
27
30 AXIS1 LABEL=(ANGLE=90 ’Residual Autocorrelation’);
31 AXIS2 LABEL=(’Lag’) ORDER=(0 TO 150 BY 10);
32
33 /* Plot empirical autocorrelation function of estimated residuals */
34 PROC GPLOT DATA=cov1;
35 PLOT corr*LAG=1 / VREF=0 VAXIS=AXIS1 HAXIS=AXIS2;
36 RUN;
37
38 PROC GPLOT DATA=cov2;
39 PLOT corr*LAG=1 / VREF=0 VAXIS=AXIS1 HAXIS=AXIS2;
40 RUN; QUIT;
One-step forecasts of the adjusted series are written together with the estimated resid-
ỹ1 , . . . ỹn are computed by means of the uals into the data file notated after the option
FORECAST statement in the ARIMA procedure. OUT. Finally, the empirical autocorrelations of
LEAD=0 prevents future forecasts beyond the the estimated residuals are computed by PROC
sample size 7300. The corresponding forecasts ARIMA and plotted by PROC GPLOT.
A more formal test is the Portmanteau-test of Box and Pierce (1970)

presented in Section 2.3. SAS uses the Box–Ljung statistic with
weighted empirical autocorrelations
1/2
∗ n+2
r̂ε̂ (k) := r̂ε̂ (k)
n−k
and computes the test statistic for different lags k.
Autocorrelation Check of Residuals
To Chi- Pr >
Lag Square DF ChiSq --------------Autocorrelations-----------------
6 3.61 1 0.0573 0.001 -0.001 -0.010 0.018 -0.001 -0.008

12 6.44 7 0.4898 0.007 0.008 0.005 -0.008 -0.006 0.012
18 21.77 13 0.0590 0.018 -0.032 -0.024 -0.011 -0.006 0.001
24 25.98 19 0.1307 -0.011 0.012 0.002 -0.012 0.013 0.000
30 28.66 25 0.2785 0.014 -0.003 0.006 0.002 0.005 0.010
36 33.22 31 0.3597 -0.011 -0.008 -0.004 -0.005 -0.015 0.013
42 41.36 37 0.2861 0.009 -0.007 -0.018 0.010 0.011 0.020
48 58.93 43 0.0535 -0.030 0.022 0.006 0.030 0.007 -0.008
54 65.37 49 0.0589 0.007 -0.008 -0.014 -0.007 -0.009 0.021
60 81.64 55 0.0114 0.015 0.007 -0.007 -0.008 0.005 0.042

Listing 7.6.3: Box–Ljung test of ARMA(3, 2)-model.
Autocorrelation Check of Residuals
To Chi- Pr >
Lag Square DF ChiSq --------------Autocorrelations-----------------
6 0.58 1 0.4478 -0.000 -0.001 -0.000 0.003 -0.002 -0.008

12 3.91 7 0.7901 0.009 0.010 0.007 -0.005 -0.004 0.013
18 18.68 13 0.1333 0.020 -0.031 -0.023 -0.011 -0.006 0.001
24 22.91 19 0.2411 -0.011 0.012 0.002 -0.012 0.013 -0.001
30 25.12 25 0.4559 0.012 -0.004 0.005 0.001 0.004 0.009
36 29.99 31 0.5177 -0.012 -0.009 -0.004 -0.006 -0.015 0.012
42 37.38 37 0.4517 0.009 -0.008 -0.018 0.009 0.010 0.019
48 54.56 43 0.1112 -0.030 0.021 0.005 0.029 0.008 -0.009
54 61.23 49 0.1128 0.007 -0.009 -0.014 -0.007 -0.009 0.021
60 76.95 55 0.0270 0.015 0.006 -0.006 -0.009 0.004 0.042
Listing 7.6.3b: Box–Ljung test of ARMA(2, 3)-model.
1 /* donauwoerth_dcheck3.sas*/
2 TITLE1 ’Residual Analysis’;
4
5 /* Note that this program requires ’seasad’ and ’donau’ generated by
,→the program donauwoerth_adjustment.sas */
6
7 /* Test preparation */
8 DATA donau2;
9 MERGE donau seasad(KEEP=SA);
10
11 /* Test for white noise */
12 PROC ARIMA DATA=donau2;
14 ESTIMATE METHOD=CLS p=3 q=2 WHITENOISE=IGNOREMISS;
15 FORECAST LEAD=0 OUT=out32(KEEP=residual RENAME=residual=res32);
16 ESTIMATE METHOD=CLS p=2 q=3 WHITENOISE=IGNOREMISS;
18 RUN; QUIT;
The Portmanteau-test of Box–Ljung is initiated residuals are also written in the files out32 and
by the WHITENOISE=IGNOREMISS option in out23 for further use.
the ESTIMATE statement of PROC ARIMA. The
For both models, the Listings 7.6.3 and 7.6.3 indicate sufficiently great
p-values for the Box–Ljung test statistic up to lag 54, letting us pre-
sume that the residuals are representations of a white noise process.
Furthermore, we might deduce that the ARMA(2, 3)-model is the

more appropriate model, since the p-values are remarkably greater
than those of the ARMA(3, 2)-model.
Overfitting
The third diagnostic class is the method of overfitting. To verify the
fitting of an ARMA(p, q)-model, slightly more comprehensive models
with additional parameters are estimated, usually an ARMA(p, q +
1)- and ARMA(p + 1, q)-model. It is then expected that these new
additional parameters will be close to zero, if the initial ARMA(p, q)-
model is adequate.
The original ARMA(p, q)-model should be considered hazardous or
critical, if one of the following aspects occur in the more comprehen-
sive models:
1. The ’old’ parameters, which already have been estimated in the

ARMA(p, q)-model, are instable, i.e., the parameters differ ex-
tremely.
2. The ’new’ parameters differ significantly from zero.
3. The variance of the residuals is remarkably smaller than the cor-

responding one in the ARMA(p, q)-model.
Standard Approx
MU -3.82315 7.23684 -0.53 0.5973 0

MA1,1 0.29901 0.14414 2.07 0.0381 1
MA1,2 0.37478 0.07795 4.81 <.0001 2
MA1,3 0.12273 0.05845 2.10 0.0358 3
AR1,1 1.57265 0.14479 10.86 <.0001 1
AR1,2 -0.54576 0.25535 -2.14 0.0326 2
AR1,3 -0.03884 0.11312 -0.34 0.7313 3
Listing 7.6.4: Estimated coefficients of ARMA(3, 3)-model.
Standard Approx
MU -3.93060 7.36255 -0.53 0.5935 0

MA1,1 0.79694 0.11780 6.77 <.0001 1
MA1,2 0.08912 0.09289 0.96 0.3374 2
AR1,1 2.07048 0.11761 17.60 <.0001 1
AR1,2 -1.46595 0.23778 -6.17 <.0001 2
AR1,3 0.47350 0.16658 2.84 0.0045 3
AR1,4 -0.08461 0.04472 -1.89 0.0585 4
Listing 7.6.4b: Estimated coefficients of ARMA(4, 2)-model.
Standard Approx
MU -3.83016 7.24197 -0.53 0.5969 0

MA1,1 0.36178 0.05170 7.00 <.0001 1
MA1,2 0.35398 0.02026 17.47 <.0001 2
MA1,3 0.10115 0.01585 6.38 <.0001 3
MA1,4 -0.0069054 0.01850 -0.37 0.7089 4
AR1,1 1.63540 0.05031 32.51 <.0001 1
AR1,2 -0.64655 0.04735 -13.65 <.0001 2
Listing 7.6.4c: Estimated coefficients of ARMA(2, 4)-model.
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum

----------------------------------------------------------------------
res33 7300 0.2153979 37.1219190 -254.0443970 293.3993079
res42 7300 0.2172515 37.1232437 -253.8610208 294.0049942
res24 7300 0.2155696 37.1218896 -254.0365135 293.4197058
res32 7300 0.2090862 37.1311356 -254.2828124 292.9240859
res23 7300 0.2143327 37.1222160 -254.0171144 293.3822232
----------------------------------------------------------------------
Listing 7.6.4d: Statistics of estimated residuals of considered
overfitted models.
1 /* donauwoerth_overfitting.sas */
2 TITLE1 ’Overfitting’;
,→program donauwoerth_adjustment.sas as well as ’out32’ and ’out23’
,→ generated by donauwoerth_dcheck3.sas */
5
6 /* Estimate coefficients for chosen orders p and q */

11 RUN;
12
17 RUN;
18
23 RUN;
24
25 /* Merge estimated residuals */
26 DATA residuals;
27 MERGE out33 out42 out24 out32 out23;
28
29 /* Compute standard deviations of estimated residuals */
30 PROC MEANS DATA=residuals;
31 VAR res33 res42 res24 res32 res23;
32 RUN; QUIT;
The coefficients of the overfitted models mated residuals of all considered models are
are again computed by the conditional least merged into one data file and the procedure
squares approach (CLS) in the framework of PROC MEANS computes their standard devia-
the procedure PROC ARIMA. Finally, the esti- tions.
SAS computes the model coefficients for the chosen orders (3, 3), (2, 4)
and (4, 2) by means of the conditional least squares method and pro-
vides the ARMA(3, 3)-model
Yt − 1.57265Yt−1 + 0.54576Yt−2 + 0.03884Yt−3
= εt − 0.29901εt−1 − 0.37478εt−2 − 0.12273εt−3 ,
the ARMA(4, 2)-model
Yt − 2.07048Yt−1 + 1.46595Yt−2 − 0.47350Yt−3 + 0.08461Yt−4
= εt − 0.79694εt−1 − 0.08912εt−2
as well as the ARMA(2, 4)-model
Yt − 1.63540Yt−1 + 0.64655Yt−2
= εt − 0.36178εt−1 − 0.35398εt−2 − 0.10115εt−3 + 0.0069054εt−4 .
This overfitting method indicates that, once more, we have reason to

doubt the suitability of the adjusted ARMA(3, 2)-model
Yt − 1.85383Yt−1 + 1.04408Yt−2 − 0.17899Yt−3

= εt − 0.58116εt−1 − 0.23329εt−2 ,
since the ’old’ parameters in the ARMA(4, 2)- and ARMA(3, 3)-model
differ extremely from those inherent in the above ARMA(3, 2)-model
and the new added parameter −0.12273 in the ARMA(3, 3)-model is
not close enough to zero, as can be seen by the corresponding p-value.
Whereas the adjusted ARMA(2, 3)-model
Yt = − 1.61943Yt−1 + 0.63143Yt−2
= εt − 0.34553εt−1 − 0.34821εt−2 − 0.10327εt−3 (7.59)
satisfies these conditions. Finally, Listing 7.6.4d indicates the almost

equal standard deviations of 37.1 of all estimated residuals. Accord-
ingly, we assume almost equal residual variances. Therefore none of
the considered models has remarkably smaller variance in the residu-
als.
7.7 Forecasting
To justify forecasts based on the apparently adequate ARMA(2, 3)-
model (7.59), we have to verify its forecasting ability. It can be
achieved by either ex-ante or ex-post best one-step forecasts of the
sample Y1 , . . . ,Yn . Latter method already has been executed in the
residual examination step. We have shown that the estimated forecast
errors ε̂t := ỹt − ŷt , t = 1, . . . , n can be regarded as a realization of a
white noise process. The ex-ante forecasts are based on the first n−m
observations ỹ1 , . . . , ỹn−m , i.e., we adjust an ARMA(2, 3)-model to the
reduced time series ỹ1 , . . . , ỹn−m . Afterwards, best one-step forecasts
ŷt of Yt are estimated for t = 1, . . . , n, based on the parameters of this
new ARMA(2, 3)-model.
Now, if the ARMA(2, 3)-model (7.59) is actually adequate, we will
again expect that the estimated forecast errors ε̂t = ỹt − ŷt , t = n −
m + 1, . . . , n behave like realizations of a white noise process, where
7.7 Forecasting 325
the standard deviations of the estimated residuals ε̂1 , . . . , ε̂n−m and

estimated forecast errors ε̂n−m+1 , . . . , ε̂n should be close to each other.
Plot 7.7.1: Extract of estimated residuals.

Plot 7.7.1b: Estimated forecast errors of ARMA(2, 3)-model based on

the reduced sample.
The SPECTRA Procedure
Test for White Noise for Variable RESIDUAL
M = 3467
Max(P(*)) 44294.13
Sum(P(*)) 9531509
Kappa 16.11159

Test Statistic 0.006689

Approximate P-Value 0.9978
Listing 7.7.1c: Test for white noise of estimated residuals.

7.7 Forecasting 327
Plot 7.7.1d: Visual representation of Bartlett-Kolmogorov-Smirnov

test for white noise of estimated residuals.
M = 182
Max(P(*)) 18698.19
Sum(P(*)) 528566
Kappa 6.438309


Listing 7.7.1e: Test for white noise of estimated forecast errors.
Plot 7.7.1f: Visual representation of Bartlett-Kolmogorov-Smirnov

test for white noise of estimated forecast errors.
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum

----------------------------------------------------------------------
RESIDUAL 6935 -0.0038082 37.0756613 -254.3711804 293.3417594
forecasterror 365 -0.0658013 38.1064859 -149.4454536 222.4778315
----------------------------------------------------------------------
Listing 7.7.1g: Statistics of estimated residuals and estimated forecast
errors.
M-1 2584
Max(P(*)) 24742.39
Sum(P(*)) 6268989
Fisher’s Kappa: (M-1)*Max(P(*))/Sum(P(*))
Kappa 10.19851

7.7 Forecasting 329


Listing 7.7.1h: Test for white noise of the first 5170 estimated
residuals.
M-1 2599
Max(P(*)) 39669.63
Sum(P(*)) 6277826
Fisher’s Kappa: (M-1)*Max(P(*))/Sum(P(*))
Kappa 16.4231


Listing 7.7.1i: Test for white noise of the first 5200 estimated residuals.
1 /* donauwoerth_forecasting.sas */
2 TITLE1 ’Forecasting’;
4
,→program donauwoerth_adjustment.sas */
6
7 /* Cut off the last 365 values */

8 DATA reduced;
9 SET seasad(OBS=6935);
10 N=_N_;
11
12 /* Estimate model coefficients from reduced sample */
13 PROC ARIMA DATA=reduced;
15 ESTIMATE METHOD=CLS p=2 q=3 OUTMODEL=model NOPRINT;
16 RUN;
17
18 /* One-step-forecasts of previous estimated model */

21 ESTIMATE METHOD=CLS p=2 q=3 MU=-0.04928 AR=1.60880 -0.62077 MA
,→=0.33764 0.35157 0.10282 NOEST NOPRINT;
22 FORECAST LEAD=0 OUT=forecast NOPRINT;
23 RUN;
24
25 DATA forecast;
26 SET forecast;
27 lag=_N_;
28
29 DATA residual1(KEEP=residual lag);
30 SET forecast(OBS=6935);
31
32 DATA residual2(KEEP=residual date);
33 MERGE forecast(FIRSTOBS=6936) donau(FIRSTOBS=6936 KEEP=date);
34
37
40
43 SYMBOL2 V=DOT I=JOIN H=0.4 C=BLUE W=1 L=1;
44 AXIS1 LABEL=(’Index’)
45 ORDER=(4800 5000 5200);
46 AXIS2 LABEL=(ANGLE=90 ’Estimated residual’);
47 AXIS3 LABEL=(’January 2004 to January 2005’)
48 ORDER=(’01JAN04’d ’01MAR04’d ’01MAY04’d ’01JUL04’d ’01SEP04’d
,→’01NOV04’d ’01JAN05’d);
49 AXIS4 LABEL=(ANGLE=90 ’Estimated forecast error’);
50
51
52 /* Plot estimated residuals */
53 PROC GPLOT DATA=residual1(FIRSTOBS=4800);
54 PLOT residual*lag=2 /HAXIS=AXIS1 VAXIS=AXIS2;
55 RUN;
56
57 /* Plot estimated forecast errors of best one-step forecasts */
58 PROC GPLOT DATA=residual2;
59 PLOT residual*date=2 /HAXIS=AXIS3 VAXIS=AXIS4;
60 RUN;
61
62 /* Testing for white noise */

63 %MACRO wn(k);
64 PROC SPECTRA DATA=residual&k P WHITETEST OUT=periodo&k;
65 VAR residual;
66 RUN;
67
68 /* Calculate the sum of the periodogram */

69 PROC MEANS DATA=periodo&k(FIRSTOBS=2) NOPRINT;
7.7 Forecasting 331
70 VAR P_01;
71 OUTPUT OUT=psum&k SUM=psum;
72 RUN;
73
74 /* Compute empirical distribution function of cumulated periodogram
,→and its confidence bands */
75 DATA conf&k;
76 SET periodo&k(FIRSTOBS=2);
77 IF _N_=1 THEN SET psum&k;
78 RETAIN s 0;
79 s=s+P_01/psum;
80 fm=_N_/(_FREQ_-1);
81 yu_01=fm+1.63/SQRT(_FREQ_-1);
82 yl_01=fm-1.63/SQRT(_FREQ_-1);
83 yu_05=fm+1.36/SQRT(_FREQ_-1);
84 yl_05=fm-1.36/SQRT(_FREQ_-1);
85
87 SYMBOL3 V=NONE I=STEPJ C=GREEN;
90 AXIS5 LABEL=(’x’) ORDER=(.0 TO 1.0 BY .1);
92
93 /* Plot empirical distribution function of cumulated periodogram with
,→its confidence bands */
94 PROC GPLOT DATA=conf&k;
95 PLOT fm*s=3 yu_01*fm=4 yl_01*fm=4 yu_05*fm=5 yl_05*fm=5 / OVERLAY
,→HAXIS=AXIS5 VAXIS=AXIS6;
96 RUN;
97 %MEND;
98
99 %wn(1);
100 %wn(2);
101
102 /* Merge estimated residuals and estimated forecast errors */
103 DATA residuals;
104 MERGE residual1(KEEP=residual) residual2(KEEP=residual RENAME=(
,→residual=forecasterror));
105
106 /* Compute standard deviations */
107 PROC MEANS DATA=residuals;
108 VAR residual forecasterror;
109 RUN;
110
111 %wn(3);
112 %wn(4);
113
114 QUIT;
In the first DATA step, the adjusted discharge of the residuals and forecast errors. The resid-
values pertaining to the discharge values mea- uals coincide with the one-step forecasts errors
sured during the last year are removed from of the first 6935 observations, whereas the fore-
the sample. In the framework of the proce- cast errors are given by one-step forecasts er-
dure PROC ARIMA, an ARMA(2, 3)-model is rors of the last 365 observations. Visual repre-
adjusted to the reduced sample. The cor- sentation is created by PROC GPLOT.
responding estimated model coefficients and The macro-step then executes Fisher’s test for
further model details are written into the file hidden periodicities as well as the Bartlett–
’model’, by the option OUTMODEL. Best one- Kolmogorov–Smirnov test for both residuals
step forecasts of the ARMA(2, 3)-model are and forecast errors within the procedure PROC
then computed by PROC ARIMA again, where SPECTRA. After renaming, they are written into
the model orders, model coefficients and model a single file. PROC MEANS computes their stan-
mean are specified in the ESTIMATE state- dard deviations for a direct comparison.
ment, where the option NOEST suppresses a Afterwards tests for white noise are computed
new model estimation. for other numbers of estimated residuals by the
The following steps deal with the examination macro.
The ex-ante forecasts are computed from the dataset without the
last 365 observations. Plot 7.7.1b of the estimated forecast errors
ε̂t = ỹt − ŷt , t = n − m + 1, . . . , n indicates an accidental pattern.
Applying Fisher’s test as well as Bartlett-Kolmogorov-Smirnov’s test
for white noise, as introduced in Section 6.1, the p-value of 0.0944
and Fisher’s κ-statistic of 6.438309 do not reject the hypothesis that
the estimated forecast errors are generated from a white noise pro-
cess (εt )t∈Z , where the εt are independent and identically normal dis-
tributed (Listing 7.7.1e). Whereas, testing the estimated residuals for
white noise behavior yields a conflicting result. Bartlett-Kolmogorov-
Smirnov’s test clearly doesn’t reject the hypothesis that the estimated
residuals are generated from a white noise process (εt )t∈Z , where the
εt are independent and identically normal distributed, due to the
great p-value of 0.9978 (Listing 7.7.1c). On the other hand, the
conservative test of Fisher produces an irritating, high κ-statistic of
16.11159, which would lead to a rejection of the hypothesis. Further
examinations with Fisher’s test lead to the κ-statistic of 10.19851,
when testing the first 5170 estimated residuals, and it leads to the
κ-statistic of 16.4231, when testing the first 5200 estimated residu-
als (Listing 7.7.1h and 7.7.1i). Applying the result of Exercise 6.12
P {κm > x} ≈ 1 − exp(−m e−x ), we obtain the approximate p-values
of 0.09 and < 0.001, respectively. That is, Fisher’s test doesn’t reject
the hypothesis for the first 5170 estimated residuals but it clearly re-
7.7 Forecasting 333
jects the hypothesis for the first 5200 ones. This would imply that
the additional 30 values cause such a great periodical influence such
that a white noise behavior is strictly rejected. In view of the small
number of 30, relatively to 5170, Plot 7.7.1 and the great p-value
0.9978 of Bartlett-Kolmogorov-Smirnov’s test we can’t trust Fisher’s
κ-statistic. Recall, as well, that the Portmanteau-test of Box–Ljung
has not rejected the hypothesis that the 7300 estimated residuals re-
sult from a white noise, carried out in Section 7.6. Finally, Listing
7.7.1g displays that the standard deviations 37.1 and 38.1 of the esti-
mated residuals and estimated forecast errors, respectively, are close
to each other.
Hence, the validity of the ARMA(2, 3)-model (7.59) has been estab-
lished, justifying us to predict future values using best h-step fore-
casts Ŷn+h based on this model. Since the model isP invertible, it can
be almost surely rewritten as AR(∞)-process Yt = u>0 cu Yt−u + εt .
Consequently,
Ph−1 the best h-step forecast Ŷn+h of Yn+h can be estimated
Pt+h−1
by ŷn+h = u=1 cu ŷn+h−u + v=h cv ỹt+h−v , cf. Theorem 2.3.4, where
the estimated forecasts ŷn+i , i = 1, . . . , h are computed recursively.
(o)
An estimated best h-step forecast ŷn+h of the original Donauwoerth
time series y1 , . . . , yn is then obviously given by
(o) (365) (2433)
ŷn+h := ŷn+h + µ + Ŝn+h + Ŝn+h , (7.60)
(365)
where µ denotes the arithmetic mean of y1 , . . . , yn and where Ŝn+h
(2433)
and Ŝn+h are estimations of the seasonal nonrandom components
at lag n + h pertaining to the cycles with periods of 365 and 2433
days, respectively. These estimations already have been computed
by Program 7.4.2. Note that we can only make reasonable forecasts
for the next few days, as ŷn+h approaches its expectation zero for
increasing h.
Plot 7.7.2: Estimated best h-step forecasts of Donauwoerth Data for

the next month by means of identified ARMA(2, 3)-model.
1 /* donauwoerth_final.sas */
2 TITLE1 ;
3 TITLE2 ;
4
5 /* Note that this program requires the file ’seasad’ and ’seasad1’
,→generated by program donauwoerth_adjustment.sas */
6
7 /* Computations of forecasts for next 31 days */
10 ESTIMATE METHOD=CLS p=2 q=3 OUTMODEL=model NOPRINT;
11 FORECAST LEAD=31 OUT=forecast;
12 RUN;
13
14 /* Plot preparations */
15 DATA forecast(DROP=residual sa);
16 SET forecast(FIRSTOBS=7301);
17 N=_N_;
18 date=MDY(1,N,2005);
19 FORMAT date ddmmyy10.;
20
21 DATA seascomp(KEEP=SC RENAME=(SC=SC1));

22 SET seasad(FIRSTOBS=1 OBS=31);
23
24 /* Adding the seasonal components and mean */
25 DATA seascomp;
26 MERGE forecast seascomp seasad1(OBS=31 KEEP=SC RENAME=(SC=SC2));
27 forecast=forecast+SC1+SC2+201.6;
Exercises 335
28
29
30 DATA plotforecast;
31 SET donau seascomp;
32
33 /* Graphical display */
35 SYMBOL2 V=DOT I=JOIN H=0.4 C=BLUE W=1 L=1;
36 AXIS1 LABEL=NONE ORDER=(’01OCT04’d ’01NOV04’d ’01DEC04’d ’01JAN05’d
,→’01FEB05’d);
37 AXIS2 LABEL=(ANGLE=90 ’Original series with forecasts’) ORDER=(0 to
,→300 by 100);
38 LEGEND1 LABEL=NONE VALUE=(’Original series’ ’Forecasts’ ’Lower 95-
,→percent confidence limit’ ’Upper 95-percent confidence limit’);
39
40 PROC GPLOT DATA=plotforecast(FIRSTOBS=7209);
41 PLOT discharge*date=1 forecast*date=2 / OVERLAY HAXIS=AXIS1 VAXIS=
,→AXIS2 LEGEND=LEGEND1;
42 RUN; QUIT;
The best h-step forecasts for the next month forecasts of the original time series are at-
are computed within PROC ARIMA by the tained. The procedure PROC GPLOT then cre-
FORECAST statement and the option LEAD. ates a graphical visualization of the best h-step
Adding the corresponding seasonal compo- forecasts.
nents and the arithmetic mean (see (7.60)),
Result and Conclusion

The case study indicates that an ARMA(2, 3)-model can be adjusted
to the prepared data of discharge values taken by the gaging station
nearby the Donau at Donauwoerth for appropriately explaining the
data. For this reason, we assume that a new set of data containing
further or entirely future discharge values taken during the years after
2004 can be modeled by means of an ARMA(2, 3)-process as well,
entailing reasonable forecasts for the next few days.
Exercises
7.1. Show the following generalization of Lemma 7.3.1: Consider two
sequences (Yn )n∈N and (Xn )n∈N of real valued random k-vectors,
all defined on the same probability space (Ω, A, P) and a sequence
(hn )n∈N of positive real valued numbers, such that Yn = O P (hn ) and
Xn = oP (1). Then, Yn Xn = oP (hn ).
7.2. Prove the derivation of Equation 7.42.
7.3. (IQ Data) Apply the Box–Jenkins program to the IQ Data. This
dataset contains a monthly index of quality ranging from 0 to 100.
Bibliography
Akaike, H. (1977). On entropy maximization principle. Applications
of Statistics, Amsterdam, pages 27–41.
Akaike, H. (1978). On the likelihood of a time series model. The

Statistican, 27:271–235.
Anderson, T. (1984). An Introduction to Multivariate Statistical Anal-

ysis. Wiley, New York, 2nd edition.
Andrews, D. W. (1991). Heteroscedasticity and autocorrelation con-

sistent covariance matrix estimation. Econometrica, 59:817–858.
Billingsley, P. (1968). Convergence of Probability Measures. John

Wiley, New York, 2nd edition.
Bloomfield, P. (2000). Fourier Analysis of Time Series: An Introduc-

tion. John Wiley, New York, 2nd edition.
Bollerslev, T. (1986). Generalized autoregressive conditional het-

eroscedasticity. Journal of Econometrics, 31:307–327.
Box, G. E. and Cox, D. (1964). An analysis of transformations (with

discussion). J.R. Stat. Sob. B., 26:211–252.
Box, G. E., Jenkins, G. M., and Reinsel, G. (1994). Times Series

Analysis: Forecasting and Control. Prentice Hall, 3nd edition.
Box, G. E. and Pierce, D. A. (1970). Distribution of residual

autocorrelation in autoregressive-integrated moving average time
series models. Journal of the American Statistical Association,
65(332):1509–1526.
338 BIBLIOGRAPHY
Box, G. E. and Tiao, G. (1977). A canonical analysis of multiple time

series. Biometrika, 64:355–365.
Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and

Methods. Springer, New York, 2nd edition.
Brockwell, P. J. and Davis, R. A. (2002). Introduction to Time Series

and Forecasting. Springer, New York.
Choi, B. (1992). ARMA Model Identification. Springer, New York.
Cleveland, W. and Tiao, G. (1976). Decomposition of seasonal time

series: a model for the census x–11 program. Journal of the Amer-
ican Statistical Association, 71(355):581–587.
Conway, J. B. (1978). Functions of One Complex Variable. Springer,

New York, 2nd edition.
Enders, W. (2004). Applied econometric time series. Wiley, New

York.
Engle, R. (1982). Autoregressive conditional heteroscedasticity with

estimates of the variance of uk inflation. Econometrica, 50:987–
1007.
Engle, R. and Granger, C. (1987). Co–integration and error correc-

tion: representation, estimation and testing. Econometrica, 55:251–
276.
Falk, M., Marohn, F., and Tewes, B. (2002). Foundations of Statistical

Analyses and Applications with SAS. Birkhäuser.
Fan, J. and Gijbels, I. (1996). Local Polynominal Modeling and Its

Application. Chapman and Hall, New York.
Feller, W. (1971). An Introduction to Probability Theory and Its Ap-

plications. John Wiley, New York, 2nd edition.
Ferguson, T. (1996). A course in large sample theory. Texts in Sta-

tistical Science, Chapman and Hall.
BIBLIOGRAPHY 339
Fuller, W. A. (1995). Introduction to Statistical Time Series. Wiley-

Interscience, 2nd edition.
Granger, C. (1981). Some properties of time series data and their

use in econometric model specification. Journal of Econometrics,
16:121–130.
Hamilton, J. D. (1994). Time Series Analysis. Princeton University

Press, Princeton.
Hannan, E. and Quinn, B. (1979). The determination of the or-

der of an autoregression. Journal of the Royal Statistical Society,
41(2):190–195.
Horn, R. and Johnson, C. (1985). Matrix analysis. Cambridge Uni-

versity Press, pages 176—-180.
Janacek, G. and Swift, L. (1993). Time Series. Forecasting, Simula-

tion, Applications. Ellis Horwood, New York.
Johnson, R. and Wichern, D. (2002). Applied Multivariate Statistical

Analysis. Prentice Hall, 5th edition.
Kalman, R. (1960). A new approach to linear filtering and prediction

problems. Transactions of the ASME – Journal of Basic Engineer-
ing, 82:35–45.
Karr, A. (1993). Probability. Springer, New York.
Kendall, M. and Ord, J. (1993). Time Series. Arnold, Sevenoaks, 3rd

edition.
Ljung, G. and Box, G. E. (1978). On a measure of lack of fit in time

series models. Biometrika, 65:297–303.
Murray, M. (1994). A drunk and her dog: an illustration of cointe-

gration and error correction. The American Statistician, 48:37–39.
Newton, J. H. (1988). Timeslab: A Time Series Analysis Laboratory.

Brooks/Cole, Pacific Grove.
340 BIBLIOGRAPHY
Parzen, E. (1957). On consistent estimates of the spectrum of the sta-

tionary time series. Annals of Mathematical Statistics, 28(2):329–
348.
Phillips, P. and Ouliaris, S. (1990). Asymptotic properties of residual

based tests for cointegration. Econometrica, 58:165–193.
Phillips, P. and Perron, P. (1988). Testing for a unit root in time

series regression. Biometrika, 75:335–346.
Pollard, D. (1984). Convergence of stochastic processes. Springer,

New York.
Quenouille, M. (1968). The Analysis of Multiple Time Series. Griffin,

London.
Reiss, R. (1989). Approximate Distributions of Order Statistics. With

Applications to Nonparametric Statistics. Springer, New York.
Rudin, W. (1986). Real and Complex Analysis. McGraw-Hill, New

York, 3rd edition.
SAS Institute Inc. (1992). SAS Procedures Guide, Version 6. SAS

Institute Inc., 3rd edition.
SAS Institute Inc. (2010a). Sas 9.2 documentation. http:

//support.sas.com/documentation/cdl_main/index.
html.
SAS Institute Inc. (2010b). Sas onlinedoc 9.1.3. http://support.

sas.com/onlinedoc/913/docMainpage.jsp.
Schlittgen, J. and Streitberg, B. (2001). Zeitreihenanalyse. Olden-

bourg, Munich, 9th edition.
Seeley, R. (1970). Calculus of Several Variables. Scott Foresman,

Glenview, Illinois.
Shao, J. (2003). Mathematical Statistics. Springer, New York, 2nd

edition.
BIBLIOGRAPHY 341
Shiskin, J. and Eisenpress, H. (1957). Seasonal adjustment by elec-

tronic computer methods. Journal of the American Statistical As-
sociation, 52:415–449.
Shiskin, J., Young, A., and Musgrave, J. (1967). The x–11 variant of
census method ii seasonal adjustment program. Technical paper 15,
Bureau of the Census, U.S. Dept. of Commerce.
Shumway, R. and Stoffer, D. (2006). Time Series Analysis and Its

Applications — With R Examples. Springer, New York, 2nd edition.
Simonoff, J. S. (1996). Smoothing Methods in Statistics. Series in

Statistics, Springer, New York.
Tintner, G. (1958). Eine neue methode für die schätzung der logistis-
chen funktion. Metrika, 1:154–157.
Tsay, R. and Tiao, G. (1984). Consistent estimates of autoregressive

parameters and extended sample autocorrelation function for sta-
tionary and nonstationary arma models. Journal of the American
Statistical Association, 79:84–96.
Tsay, R. and Tiao, G. (1985). Use of canonical analysis in time series

model identification. Biometrika, 72:229–315.
Tukey, J. (1949). The sampling theory of power spectrum estimates.

proc. symp. on applications of autocorrelation analysis to physi-
cal problems. Technical Report NAVEXOS-P-735, Office of Naval
Research, Washington.
Wallis, K. (1974). Seasonal adjustment and relations between vari-

ables. Journal of the American Statistical Association, 69(345):18–
31.
Wei, W. and Reilly, D. P. (1989). Time Series Analysis. Univariate

and Multivariate Methods. Addison–Wesley, New York.
Index
χ2 -distribution, see Distribution Bayesian information criterion,
t-distribution, see Distribution see Information criterion
BIC, see Information criterion
Absolutely summable filter, see
Blackman–Tukey kernel, see Ker-
Filter
nel
AIC, see Information criterion
Boundedness in probability, 259
Akaike’s information criterion, see
Box–Cox Transformation, 40
Information criterion
Box–Jenkins program, 99, 223,
Aliasing, 155
272
Allometric function, see Func-
Box–Ljung test, see Test
tion
AR-process, see Process Cauchy–Schwarz inequality, 37
ARCH-process, see Process Causal, 58
ARIMA-process, see Process Census
ARMA-process, see Process U.S. Bureau of the, 23
Asymptotic normality X–11 Program, 23
of random variables, 235 X–12 Program, 25
of random vectors, 239 Change point, 34
Augmented Dickey–Fuller test, Characteristic function, 235
see Test Cointegration, 82, 89
Autocorrelation regression, 83
-function, 35 Complex number, 47
Autocovariance Conditional least squares method,
-function, 35 see Least squares method
Confidence band, 193
Backforecasting, 105
Conjugate complex number, 47
Backward shift operator, 296
Continuity theorem, 237
Band pass filter, see Filter
Correlogram, 36
Bandwidth, 173
Covariance, 48
Bartlett kernel, see Kernel
Bartlett’s formula, 272
INDEX 343
Covariance generating function, Degree of Freedom, 91

55, 56 Delta-Method, 262
Covering theorem, 191 Demeaned and detrended case,
Cramér–Wold device, 201, 234, 90
239 Demeaned case, 90
Critical value, 191 Design matrix, 28
Crosscovariances Diagnostic check, 106, 311
empirical, 140 Dickey–Fuller test, see Test
Cycle Difference
Juglar, 25 seasonal, 32
Césaro convergence, 183 Difference filter, see Filter
Discrete spectral average estima-
Data tor, 203, 204
Airline, 37, 131, 193 Distribution
Bankruptcy, 45, 147 χ2 -, 189, 213
Car, 120 t-, 91
Donauwoerth, 223 Dickey–Fuller, 86
Electricity, 30 exponential, 189, 190
Gas, 133 gamma, 213
Hog, 83, 87, 89 Gumbel, 191, 219
Hogprice, 83 uniform, 190
Hogsuppl, 83 Distribution function
Hongkong, 93 empirical, 190
Income, 13 Drift, 86
IQ, 336 Drunkard’s walk, 82
Nile, 221
Population1, 8 Equivalent degree of freedom, 213
Population2, 41 Error correction, 85
Public Expenditures, 42 ESACF method, 290
Share, 218 Euler’s equation, 146
Star, 136, 144 Exponential distribution, see Dis-
Sunspot, iii, 36, 207 tribution
Temperatures, 21 Exponential smoother, see Fil-
Unemployed Females, 18, 42 ter
Unemployed1, 2, 21, 26, 44
Unemployed2, 42 Fatou’s lemma, 51
Wolf, see Sunspot Data Filter
Wölfer, see Sunspot Data absolutely summable, 49
344 INDEX
band pass, 173 Gamma distribution, see Distri-

difference, 30 bution
exponential smoother, 33 Gamma function, see Function
high pass, 173 GARCH-process, see Process
inverse, 58 Gausian process, see Process
linear, 16, 17, 203 General linear process, see Pro-
low pass, 17, 173 cess
product, 57 Gibb’s phenomenon, 175
Fisher test for hidden periodici- Gompertz curve, 11
ties, see Test Gumbel distribution, see Distri-
Fisher’s kappa statistic, 191 bution
Forecast
Hang Seng
by an exponential smoother,
closing index, 93
35
Hannan–Quinn information cri-
ex-ante, 324
terion, see Information cri-
ex-post, 324
terion
Forecasting, 107, 324
Harmonic wave, 136
Forward shift operator, 296
Helly’s selection theorem, 163
Fourier frequency, see Frequency
Henderson moving average, 24
Fourier transform, 147
Herglotz’s theorem, 161
inverse, 152
Hertz, 135
Frequency, 135
High pass filter, see Filter
domain, 135
Fourier, 140 Imaginary part
Nyquist, 155, 166 of a complex number, 47
Function Information criterion
quadratic, 102 Akaike’s, 100
allometric, 13 Bayesian, 100
Cobb–Douglas, 13 Hannan–Quinn, 100
gamma, 91, 213 Innovation, 91
logistic, 6 Input, 17
Mitscherlich, 11 Intensity, 142
positive semidefinite, 160 Intercept, 13
transfer, 167 Inverse filter, see Filter
Fundamental period, see Period
Kalman
Gain, see Power transfer func- filter, 125, 129
tion h-step prediction, 129
INDEX 345
prediction step of, 129 Maximum impregnation, 8

updating step of, 129 Maximum likelihood
gain, 127, 128 equations, 285
recursions, 125 estimator, 93, 102, 285
Kernel principle, 102
Bartlett, 210 MINIC method, 289
Blackman–Tukey, 210, 211 Mitscherlich function, see Func-
Fejer, 221 tion
function, 209 Moving average, 17, 203
Parzen, 210 simple, 17
triangular, 210
Normal equations, 27, 139
truncated, 210
North Rhine-Westphalia, 8
Tukey–Hanning, 210, 212
Nyquist frequency, see Frequency
Kolmogorov’s theorem, 160, 161
Kolmogorov–Smirnov statistic, 192 Observation equation, 121
Kronecker’s lemma, 218 Order selection, 99, 284
Order statistics, 190
Landau-symbol, 259
Output, 17
Laurent series, 55
Overfitting, 100, 311, 321
Leakage phenomenon, 207
Least squares, 104 Partial autocorrelation, 224, 231
estimate, 6, 83 coefficient, 70
approach, 105 empirical, 71
filter design, 173 estimator, 250
Least squares method function, 225
conditional, 310 Partial correlation, 224
Levinson–Durbin recursion, 229, matrix, 232
316 Partial covariance
Likelihood function, 102 matrix, 232
Lindeberg condition, 201 Parzen kernel, see Kernel
Linear process, 204 Penalty function, 288
Local polynomial estimator, 27 Period, 135
Log returns, 93 fundamental, 136
Logistic function, see Function Periodic, 136
Loglikelihood function, 103 Periodogram, 143, 189
Low pass filter, see Filter cumulated, 190
Phillips–Ouliaris test, see Test
MA-process, see Process
Polynomial
346 INDEX
characteristic, 57 Seasonally adjusted, 16

Portmanteau test, see Test Simple moving average, see Mov-
Power of approximation, 228 ing average
Power transfer function, 167 Slope, 13
Prediction, 6 Slutzky’s lemma, 218
Principle of parsimony, 99, 284 Spectral
Process density, 159, 162
autoregressive , 63 of AR-processes, 176
AR, 63 of ARMA-processes, 175
ARCH, 91 of MA-processes, 176
ARIMA, 80 distribution function, 161
ARMA, 74 Spectrum, 159
autoregressive moving aver- Square integrable, 47
age, 74 Squared canonical correlations,
cosinoid, 112 304
GARCH, 93 Standard case, 89
Gaussian, 101 State, 121
general linear, 49 equation, 121
MA, 61 State-space
Markov, 114 model, 121
SARIMA, 81 representation, 122
SARMA, 81 Stationarity condition, 64, 75
stationary, 49 Stationary process, see Process
stochastic, 47 Stochastic process, see Process
Product filter, see Filter Strict stationarity, 245
Protmanteau test, 320
Taylor series expansion in prob-
2
R , 15 ability, 261
Random walk, 81, 114 Test
Real part augmented Dickey–Fuller, 83
of a complex number, 47 Bartlett–Kolmogorov–Smirnov,
Residual, 6, 316 192, 332
Residual sum of squares, 105, 139 Box–Ljung, 107
Box–Pierce, 106
SARIMA-process, see Process Dickey–Fuller, 83, 86, 87, 276
SARMA-process, see Process Fisher for hidden periodici-
Scale, 91 ties, 191, 332
SCAN-method, 303 Phillips–Ouliaris, 83
INDEX 347
Phillips–Ouliaris for cointe-

gration, 89
Portmanteau, 106, 319
Tight, 259
Time domain, 135
Time series
seasonally adjusted, 21
Time Series Analysis, 1
Transfer function, see Function
Trend, 2, 16
Tukey–Hanning kernel, see Ker-
nel
Uniform distribution, see Distri-
bution
Unit root, 86
test, 82
Variance, 47
Variance stabilizing transforma-
tion, 40
Volatility, 90
Weights, 16, 17
White noise, 49
spectral density of a, 164
X–11 Program, see Census
X–12 Program, see Census
Yule–Walker
equations, 68, 71, 226
estimator, 250
SAS-Index
| |, 8 CORR, 37
*, 4
DATA, 4, 8, 96, 175, 276, 280,
;, 4
284, 316, 332
@@, 37
DATE, 27
$, 4
DECOMP, 23
%MACRO, 316, 332
DELETE, 32
%MEND, 316
DESCENDING, 276
FREQ , 196
DIF, 32
N , 19, 39, 96, 138, 196
DISPLAY, 32
ADD, 23 DIST, 98
ADDITIVE, 27 DO, 8, 175
ADF, 280 DOT, 5
ANGLE, 5
ESACF, 309
ARIMA, see PROC
ESTIMATE, 309, 320, 332
ARMACOV, 118, 316
EWMA, 19
ARMASIM, 118, 220
EXPAND, see PROC
ARRAY, 316
AUTOREG, see PROC F=font, 8
AXIS, 5, 27, 37, 66, 68, 175, 196 FIRSTOBS, 146
FORECAST, 319
C=color, 5
FORMAT, 19, 276
CGREEK, 66
FREQ, 146
CINV, 216
FTEXT, 66
CLS, 309, 323
CMOVAVE, 19 GARCH, 98
COEF, 146 GOPTION, 66
COMPLEX, 66 GOPTIONS, 32
COMPRESS, 8, 80 GOUT, 32
CONSTANT, 138 GPLOT, see PROC
CONVERT, 19 GREEK, 8
SAS-INDEX 349
GREEN, 5 NOEST, 332

GREPLAY, see PROC NOFS, 32
NOINT, 87, 98
H=height, 5, 8 NONE, 68
HPFDIAG, see PROC NOOBS, 4
I=display style, 5, 196 NOPRINT, 196
ID, 19 OBS, 4, 276
IDENTIFY, 37, 74, 98, 309 ORDER, 37
IF, 196 OUT, 19, 23, 146, 194, 319
IGOUT, 32 OUTCOV, 37
IML, 118, 220, see PROC OUTDECOMP, 23
INFILE, 4 OUTEST, 11
INPUT, 4, 27, 37 OUTMODEL, 332
INTNX, 19, 23, 27 OUTPUT, 8, 73, 138
JOIN, 5, 155 P (periodogram), 146, 194, 209
L=line type, 8 PARAMETERS, 11
LABEL, 5, 66, 179 PARTCORR, 74
LAG, 37, 74, 212 PERIOD, 146
LEAD, 132, 335 PERROR, 309
LEGEND, 8, 27, 66 PHILLIPS, 89
LOG, 40 PI, 138
PLOT, 5, 8, 96, 175
MACRO, see %MACRO PREFIX, 316
MDJ, 276 PRINT, see PROC
MEANS, see PROC PROBDF, 86
MEND, see %MEND PROC, 4, 175
MERGE, 11, 280, 316, 323 ARIMA, 37, 45, 74, 98, 118,
METHOD, 19, 309 120, 148, 276, 280, 284,
MINIC, 309 309, 319, 320, 323, 332,
MINOR, 37, 68 335
MODE, 23, 280 AUTOREG, 89, 98
MODEL, 11, 89, 98, 138 EXPAND, 19
MONTHLY, 27 GPLOT, 5, 8, 11, 32, 37, 41,
NDISPLAY, 32 45, 66, 68, 73, 80, 85, 96,
NLAG, 37 98, 146, 148, 150, 179, 196,
NLIN, see PROC 209, 212, 276, 284, 316,
350 SAS-INDEX
319, 332, 335 SYMBOL, 5, 8, 27, 66, 155, 175,

GREPLAY, 32, 85, 96 179, 196
HPFDIAG, 87
IML, 316 T, 98
MEANS, 196, 276, 323 TAU, 89
NLIN, 11, 41 TC=template catalog, 32
PRINT, 4, 146, 276 TDFI, 94
REG, 41, 138 TEMPLATE, 32
SORT, 146, 276 TIMESERIES, see PROC
SPECTRA, 146, 150, 194, 209, TITLE, 4
212, 276, 332 TR, 88
STATESPACE, 132, 133 TRANSFORM, 19
TIMESERIES, 23, 280 TRANSPOSE, see PROC
TRANSPOSE, 316 TREPLAY, 32
X11, 27, 42, 44 TUKEY, 212
QUIT, 4 V= display style, 5

VAR, 4, 132, 146
R (regression), 88 VREF, 37, 85, 276
RANNOR, 44, 68, 212
REG, see PROC W=width, 5
RETAIN, 196 WEIGHTS, 209
RHO, 89 WHERE, 11, 68
RUN, 4, 32 WHITE, 32
WHITETEST, 194
S (spectral density), 209, 212
S (studentized statistic), 88 X11, see PROC
SA, 23 ZM, 88
SCAN, 309
SEASONALITY, 23, 280
SET, 11
SHAPE, 66
SM, 88
SORT, see PROC
SPECTRA,, see PROC
STATESPACE, see PROC
STATIONARITY, 89, 280
STEPJ (display style), 196
SUM, 196
GNU Free
Documentation License
Version 1.3, 3 November 2008
Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
http://fsf.org/
Everyone is permitted to copy and distribute verbatim copies of this license document, but
changing it is not allowed.
Preamble 1. APPLICABILITY AND

DEFINITIONS
The purpose of this License is to make a manual, This License applies to any manual or other
textbook, or other functional and useful docu- work, in any medium, that contains a notice
ment “free” in the sense of freedom: to assure placed by the copyright holder saying it can
everyone the effective freedom to copy and redis- be distributed under the terms of this License.
tribute it, with or without modifying it, either Such a notice grants a world-wide, royalty-
commercially or noncommercially. Secondarily, free license, unlimited in duration, to use that
this License preserves for the author and pub- work under the conditions stated herein. The
lisher a way to get credit for their work, while “Document”, below, refers to any such manual
not being considered responsible for modifica- or work. Any member of the public is a licensee,
tions made by others. This License is a kind of and is addressed as “you”. You accept the li-
“copyleft”, which means that derivative works cense if you copy, modify or distribute the work
of the document must themselves be free in the in a way requiring permission under copyright
same sense. It complements the GNU General law.
Public License, which is a copyleft license de- A “Modified Version” of the Document
signed for free software. means any work containing the Document or a
We have designed this License in order to use portion of it, either copied verbatim, or with
it for manuals for free software, because free modifications and/or translated into another
software needs free documentation: a free pro- language.
gram should come with manuals providing the A “Secondary Section” is a named appendix
same freedoms that the software does. But this or a front-matter section of the Document that
License is not limited to software manuals; it deals exclusively with the relationship of the
can be used for any textual work, regardless of publishers or authors of the Document to the
subject matter or whether it is published as a Document’s overall subject (or to related mat-
printed book. We recommend this License prin- ters) and contains nothing that could fall di-
cipally for works whose purpose is instruction or rectly within that overall subject. (Thus, if the
reference. Document is in part a textbook of mathematics,
352 GNU Free Documentation Licence
a Secondary Section may not explain any math- machine-generated HTML, PostScript or PDF
ematics.) The relationship could be a matter of produced by some word processors for output
historical connection with the subject or with purposes only.
related matters, or of legal, commercial, philo- The “Title Page” means, for a printed book,
sophical, ethical or political position regarding the title page itself, plus such following pages
them. as are needed to hold, legibly, the material this
The “Invariant Sections” are certain Sec- License requires to appear in the title page. For
ondary Sections whose titles are designated, as works in formats which do not have any title
being those of Invariant Sections, in the notice page as such, “Title Page” means the text near
that says that the Document is released under the most prominent appearance of the work’s ti-
this License. If a section does not fit the above tle, preceding the beginning of the body of the
definition of Secondary then it is not allowed to text.
be designated as Invariant. The Document may The “publisher” means any person or entity
contain zero Invariant Sections. If the Docu- that distributes copies of the Document to the
ment does not identify any Invariant Sections public.
then there are none. A section “Entitled XYZ” means a named
subunit of the Document whose title either
The “Cover Texts” are certain short passages
is precisely XYZ or contains XYZ in paren-
of text that are listed, as Front-Cover Texts or
theses following text that translates XYZ in
Back-Cover Texts, in the notice that says that
another language. (Here XYZ stands for a
the Document is released under this License. A
specific section name mentioned below, such
Front-Cover Text may be at most 5 words, and
as “Acknowledgements”, “Dedications”,
a Back-Cover Text may be at most 25 words.
“Endorsements”, or “History”.) To
A “Transparent” copy of the Document means “Preserve the Title” of such a section when
a machine-readable copy, represented in a for- you modify the Document means that it re-
mat whose specification is available to the gen- mains a section “Entitled XYZ” according to
eral public, that is suitable for revising the doc- this definition.
ument straightforwardly with generic text edi- The Document may include Warranty Dis-
tors or (for images composed of pixels) generic claimers next to the notice which states that
paint programs or (for drawings) some widely this License applies to the Document. These
available drawing editor, and that is suitable for Warranty Disclaimers are considered to be in-
input to text formatters or for automatic trans- cluded by reference in this License, but only as
lation to a variety of formats suitable for input regards disclaiming warranties: any other im-
to text formatters. A copy made in an other- plication that these Warranty Disclaimers may
wise Transparent file format whose markup, or have is void and has no effect on the meaning of
absence of markup, has been arranged to thwart this License.
or discourage subsequent modification by read-
ers is not Transparent. An image format is not 2. VERBATIM COPYING
Transparent if used for any substantial amount
of text. A copy that is not “Transparent” is You may copy and distribute the Document
called “Opaque”. in any medium, either commercially or non-
Examples of suitable formats for Transparent commercially, provided that this License, the
copies include plain ASCII without markup, copyright notices, and the license notice say-
Texinfo input format, LaTeX input format, ing this License applies to the Document are
SGML or XML using a publicly available reproduced in all copies, and that you add no
DTD, and standard-conforming simple HTML, other conditions whatsoever to those of this Li-
PostScript or PDF designed for human modifi- cense. You may not use technical measures to
cation. Examples of transparent image formats obstruct or control the reading or further copy-
include PNG, XCF and JPG. Opaque formats ing of the copies you make or distribute. How-
include proprietary formats that can be read ever, you may accept compensation in exchange
and edited only by proprietary word processors, for copies. If you distribute a large enough num-
SGML or XML for which the DTD and/or pro- ber of copies you must also follow the conditions
cessing tools are not generally available, and the in section 3.
You may also lend copies, under the same condi- You may copy and distribute a Modified Ver-
tions stated above, and you may publicly display sion of the Document under the conditions of
copies. sections 2 and 3 above, provided that you re-
lease the Modified Version under precisely this
3. COPYING IN QUANTITY License, with the Modified Version filling the
role of the Document, thus licensing distribu-
If you publish printed copies (or copies in me- tion and modification of the Modified Version
dia that commonly have printed covers) of the to whoever possesses a copy of it. In addition,
Document, numbering more than 100, and the you must do these things in the Modified Ver-
Document’s license notice requires Cover Texts, sion:
you must enclose the copies in covers that carry,
clearly and legibly, all these Cover Texts: Front- A. Use in the Title Page (and on the covers, if
Cover Texts on the front cover, and Back-Cover any) a title distinct from that of the Doc-
Texts on the back cover. Both covers must also ument, and from those of previous versions
clearly and legibly identify you as the publisher (which should, if there were any, be listed in
of these copies. The front cover must present the the History section of the Document). You
full title with all words of the title equally promi- may use the same title as a previous ver-
nent and visible. You may add other material sion if the original publisher of that version
on the covers in addition. Copying with changes gives permission.
limited to the covers, as long as they preserve B. List on the Title Page, as authors, one or
the title of the Document and satisfy these con- more persons or entities responsible for au-
ditions, can be treated as verbatim copying in thorship of the modifications in the Modi-
other respects. fied Version, together with at least five of
If the required texts for either cover are too vo- the principal authors of the Document (all
luminous to fit legibly, you should put the first of its principal authors, if it has fewer than
ones listed (as many as fit reasonably) on the ac- five), unless they release you from this re-
tual cover, and continue the rest onto adjacent quirement.
pages. C. State on the Title page the name of the
If you publish or distribute Opaque copies of the publisher of the Modified Version, as the
Document numbering more than 100, you must publisher.
either include a machine-readable Transparent D. Preserve all the copyright notices of the
copy along with each Opaque copy, or state in Document.
or with each Opaque copy a computer-network E. Add an appropriate copyright notice for
location from which the general network-using your modifications adjacent to the other
public has access to download using public- copyright notices.
standard network protocols a complete Trans- F. Include, immediately after the copyright
parent copy of the Document, free of added ma- notices, a license notice giving the public
terial. If you use the latter option, you must permission to use the Modified Version un-
take reasonably prudent steps, when you begin der the terms of this License, in the form
distribution of Opaque copies in quantity, to en- shown in the Addendum below.
sure that this Transparent copy will remain thus G. Preserve in that license notice the full lists
accessible at the stated location until at least of Invariant Sections and required Cover
one year after the last time you distribute an Texts given in the Document’s license no-
Opaque copy (directly or through your agents tice.
or retailers) of that edition to the public. H. Include an unaltered copy of this License.
It is requested, but not required, that you con- I. Preserve the section Entitled “History”,
tact the authors of the Document well before re- Preserve its Title, and add to it an item
distributing any large number of copies, to give stating at least the title, year, new authors,
them a chance to provide you with an updated and publisher of the Modified Version as
version of the Document. given on the Title Page. If there is no sec-
tion Entitled “History” in the Document,
create one stating the title, year, authors,
4. MODIFICATIONS and publisher of the Document as given on
its Title Page, then add an item describ- Only one passage of Front-Cover Text and one of
ing the Modified Version as stated in the Back-Cover Text may be added by (or through
previous sentence. arrangements made by) any one entity. If the
J. Preserve the network location, if any, given Document already includes a cover text for the
in the Document for public access to a same cover, previously added by you or by ar-
Transparent copy of the Document, and rangement made by the same entity you are act-
likewise the network locations given in ing on behalf of, you may not add another; but
the Document for previous versions it was you may replace the old one, on explicit permis-
based on. These may be placed in the “His- sion from the previous publisher that added the
tory” section. You may omit a network lo- old one.
cation for a work that was published at least The author(s) and publisher(s) of the Document
four years before the Document itself, or if do not by this License give permission to use
the original publisher of the version it refers their names for publicity for or to assert or im-
to gives permission. ply endorsement of any Modified Version.
K. For any section Entitled “Acknowledge-
ments” or “Dedications”, Preserve the Title 5. COMBINING DOCUMENTS
of the section, and preserve in the section
all the substance and tone of each of the You may combine the Document with other doc-
contributor acknowledgements and/or ded- uments released under this License, under the
ications given therein. terms defined in section 4 above for modified
L. Preserve all the Invariant Sections of the versions, provided that you include in the com-
Document, unaltered in their text and in bination all of the Invariant Sections of all of the
their titles. Section numbers or the equiv- original documents, unmodified, and list them
alent are not considered part of the section all as Invariant Sections of your combined work
titles. in its license notice, and that you preserve all
M. Delete any section Entitled “Endorse- their Warranty Disclaimers.
ments”. Such a section may not be included The combined work need only contain one copy
in the Modified Version. of this License, and multiple identical Invariant
N. Do not retitle any existing section to be En- Sections may be replaced with a single copy. If
titled “Endorsements” or to conflict in title there are multiple Invariant Sections with the
with any Invariant Section. same name but different contents, make the ti-
O. Preserve any Warranty Disclaimers. tle of each such section unique by adding at the
end of it, in parentheses, the name of the origi-
If the Modified Version includes new front- nal author or publisher of that section if known,
matter sections or appendices that qualify as or else a unique number. Make the same adjust-
Secondary Sections and contain no material ment to the section titles in the list of Invariant
copied from the Document, you may at your op- Sections in the license notice of the combined
tion designate some or all of these sections as in- work.
variant. To do this, add their titles to the list of In the combination, you must combine any sec-
Invariant Sections in the Modified Version’s li- tions Entitled “History” in the various original
cense notice. These titles must be distinct from documents, forming one section Entitled “His-
any other section titles. tory”; likewise combine any sections Entitled
You may add a section Entitled “Endorse- “Acknowledgements”, and any sections Entitled
ments”, provided it contains nothing but en- “Dedications”. You must delete all sections En-
dorsements of your Modified Version by various titled “Endorsements”.
parties—for example, statements of peer review
or that the text has been approved by an or- 6. COLLECTIONS OF DOCUMENTS
ganization as the authoritative definition of a
standard. You may make a collection consisting of the
You may add a passage of up to five words Document and other documents released under
as a Front-Cover Text, and a passage of up to this License, and replace the individual copies
25 words as a Back-Cover Text, to the end of of this License in the various documents with
the list of Cover Texts in the Modified Version. a single copy that is included in the collection,
provided that you follow the rules of this License If a section in the Document is Entitled “Ac-
for verbatim copying of each of the documents knowledgements”, “Dedications”, or “History”,
in all other respects. the requirement (section 4) to Preserve its Ti-
You may extract a single document from such a tle (section 1) will typically require changing the
collection, and distribute it individually under actual title.
this License, provided you insert a copy of this
9. TERMINATION
License into the extracted document, and fol-
low this License in all other respects regarding You may not copy, modify, sublicense, or dis-
verbatim copying of that document. tribute the Document except as expressly pro-
vided under this License. Any attempt other-
7. AGGREGATION WITH wise to copy, modify, sublicense, or distribute it
INDEPENDENT WORKS is void, and will automatically terminate your
rights under this License.
A compilation of the Document or its derivatives However, if you cease all violation of this Li-
with other separate and independent documents cense, then your license from a particular copy-
or works, in or on a volume of a storage or distri- right holder is reinstated (a) provisionally, un-
bution medium, is called an “aggregate” if the less and until the copyright holder explicitly and
copyright resulting from the compilation is not finally terminates your license, and (b) perma-
used to limit the legal rights of the compilation’s nently, if the copyright holder fails to notify you
users beyond what the individual works permit. of the violation by some reasonable means prior
When the Document is included in an aggregate, to 60 days after the cessation.
this License does not apply to the other works in Moreover, your license from a particular copy-
the aggregate which are not themselves deriva- right holder is reinstated permanently if the
tive works of the Document. copyright holder notifies you of the violation by
If the Cover Text requirement of section 3 is ap- some reasonable means, this is the first time you
plicable to these copies of the Document, then have received notice of violation of this License
if the Document is less than one half of the en- (for any work) from that copyright holder, and
tire aggregate, the Document’s Cover Texts may you cure the violation prior to 30 days after your
be placed on covers that bracket the Document receipt of the notice.
within the aggregate, or the electronic equiva- Termination of your rights under this section
lent of covers if the Document is in electronic does not terminate the licenses of parties who
form. Otherwise they must appear on printed have received copies or rights from you under
covers that bracket the whole aggregate. this License. If your rights have been termi-
nated and not permanently reinstated, receipt
8. TRANSLATION of a copy of some or all of the same material
does not give you any rights to use it.
Translation is considered a kind of modification,
so you may distribute translations of the Doc-
10. FUTURE REVISIONS OF THIS
ument under the terms of section 4. Replac-
LICENSE
ing Invariant Sections with translations requires The Free Software Foundation may publish new,
special permission from their copyright holders, revised versions of the GNU Free Documenta-
but you may include translations of some or tion License from time to time. Such new ver-
all Invariant Sections in addition to the orig- sions will be similar in spirit to the present ver-
inal versions of these Invariant Sections. You sion, but may differ in detail to address new
may include a translation of this License, and problems or concerns. See http://www.gnu.
all the license notices in the Document, and any org/copyleft/.
Warranty Disclaimers, provided that you also Each version of the License is given a distin-
include the original English version of this Li- guishing version number. If the Document spec-
cense and the original versions of those notices ifies that a particular numbered version of this
and disclaimers. In case of a disagreement be- License “or any later version” applies to it, you
tween the translation and the original version of have the option of following the terms and con-
this License or a notice or disclaimer, the origi- ditions either of that specified version or of any
nal version will prevail. later version that has been published (not as a
draft) by the Free Software Foundation. If the the same site at any time before August 1, 2009,
Document does not specify a version number of provided the MMC is eligible for relicensing.
this License, you may choose any version ever
published (not as a draft) by the Free Software ADDENDUM: How to use this License
Foundation. If the Document specifies that a for your documents
proxy can decide which future versions of this
License can be used, that proxy’s public state-
To use this License in a document you have writ-
ment of acceptance of a version permanently au-
ten, include a copy of the License in the docu-
thorizes you to choose that version for the Doc-
ment and put the following copyright and license
ument.
notices just after the title page:
11. RELICENSING
Copyright © YEAR YOUR NAME. Per-
“Massive Multiauthor Collaboration Site” (or
mission is granted to copy, distribute
“MMC Site”) means any World Wide Web
and/or modify this document under the
server that publishes copyrightable works and
terms of the GNU Free Documentation
also provides prominent facilities for anybody
License, Version 1.3 or any later ver-
to edit those works. A public wiki that anybody
sion published by the Free Software Foun-
can edit is an example of such a server. A “Mas-
dation; with no Invariant Sections, no
sive Multiauthor Collaboration” (or “MMC”)
Front-Cover Texts, and no Back-Cover
contained in the site means any set of copy-
Texts. A copy of the license is included
rightable works thus published on the MMC
in the section entitled “GNU Free Docu-
site.
mentation License”.
“CC-BY-SA” means the Creative Commons
Attribution-Share Alike 3.0 license published by
Creative Commons Corporation, a not-for-profit If you have Invariant Sections, Front-Cover
corporation with a principal place of business in Texts and Back-Cover Texts, replace the “with
San Francisco, California, as well as future copy- . . . Texts.” line with this:
left versions of that license published by that
same organization. with the Invariant Sections being LIST
“Incorporate” means to publish or republish a THEIR TITLES, with the Front-Cover
Document, in whole or in part, as part of an- Texts being LIST, and with the Back-
other Document. Cover Texts being LIST.
An MMC is “eligible for relicensing” if it is li-
censed under this License, and if all works that If you have Invariant Sections without Cover
were first published under this License some- Texts, or some other combination of the three,
where other than this MMC, and subsequently merge those two alternatives to suit the situa-
incorporated in whole or in part into the MMC, tion.
(1) had no cover texts or invariant sections, and If your document contains nontrivial examples
(2) were thus incorporated prior to November 1, of program code, we recommend releasing these
2008. examples in parallel under your choice of free
The operator of an MMC Site may republish an software license, such as the GNU General Pub-
MMC contained in the site under CC-BY-SA on lic License, to permit their use in free software.

2011 March 01 Times PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

2011 March 01 Times PDF

Transféré par

Droits d'auteur :

Formats disponibles

A First Course on

Time Series Analysis

Chair of Statistics, University of Würzburg

Copyright © 2011 Michael Falk.

Editors Michael Falk, Frank Marohn, René Michel, Daniel Hof-

Plot 1: Sunspot data

statistical software package SAS (Statistical Analysis System). Conse-

generating function, inverse and causal filters, stationarity condition,

separating it from the rest of the text.

1 /* This is a sample comment. */

2 Models of Time Series 47

3 State-Space Models 121

4 The Frequency Domain Approach of a Time Series 135

5 The Spectrum of a Stationary Process 159

6 Statistical Analysis in the Frequency Domain 187

7 The Box–Jenkins Program: A Case Study 223

GNU Free Documentation Licence 351

1.1 The Additive Model for a Time Series

where Tt is a (monotone) function of t, called trend , and Zt reflects

describing the long term behavior of the time series. We suppose in

The following plot of the Unemployed1 Data shows a seasonal compo-

Plot 1.1.2: Unemployed1 Data.

Models with a Nonlinear Trend

f (t) = f (t; β1 , . . . , βp ). (1.3)

However, the type of the function f is known. The unknown param-

whose computation, if it exists at all, is a numerical problem. The

The Logistic Function

Plot 1.1.3: The logistic function flog with different values of β1 , β2 , β3

26 /* Plot the functions */

27 PROC GPLOT DATA=data1;

We obviously have limt→∞ flog (t) = β3 , if β1 > 0. The value β3 often

Example 1.1.2. (Population1 Data). Table 1.1.1 shows the popu-

Year t Population sizes yt Predicted values ŷt

Table 1.1.1: Population1 Data

As a prediction of the population size at time t we obtain in the logistic

Plot 1.1.4: NRW population sizes and fitted logistic function.

24 /* Merge data sets */

The Mitscherlich Function

fM (t) := fM (t; β1 , β2 , β3 ) := β1 + β2 exp(β3 t), t ≥ 0, (1.7)

where β1 , β2 ∈ R and β3 < 0. Since β3 is negative we have the

The Gompertz Curve

fG (t) := fG (t; β1 , β2 , β3 ) := exp(β1 + β2 β3t ), t ≥ 0, (1.8)

where β1 , β2 ∈ R and β3 ∈ (0, 1).

Plot 1.1.5: Gompertz curves with different parameters.

26 /*Plot the functions */

log(fG (t)) = β1 + β2 β3t = β1 + β2 exp(log(β3 )t),

and thus log(fG ) is a Mitscherlich function with parameters β1 , β2 and

The Allometric Function

fa (t) := fa (t; β1 , β2 ) = β2 tβ1 , t ≥ 0, (1.9)

with β1 ∈ R, β2 > 0, is a common trend function in biometry and

log(fa (t)) = log(β2 ) + β1 log(t), t > 0,

is a linear function of log(t), with slope β1 and intercept log(β2 ), we

log(yt ) = log(β2 ) + β1 log(t) + εt , t ≥ 1,

where εt are the error variables.

Example 1.1.3. (Income Data). Table 1.1.2 shows the (accumulated)

Year t Gross income xt Net income yt

Table 1.1.2: Income Data.

We assume that the increase of the net income yt is an allometric

log(yt ) = log(β2 ) + β1 log(t) + εt . (1.10)

β̂2 = exp(−0.7549) = 0.4700.

The predicted value ŷt corresponds to the time t

ŷt = 0.47t1.019 . (1.11)

Table 1.1.3: Residuals of Income Data.

26 /Plot the functions /

(1) 1 (1) (1) (1) (1) (1)

(1) (1) 1 1 (1) (1) (1) 1 (1)