Vous êtes sur la page 1sur 42

\documentclass[a4paper]{article}

\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}
\usepackage{graphicx}
\usepackage[colorinlistoftodos]{todonotes}
\usepackage{amsmath}
\newcommand{\vect}[1]{\boldsymbol{\mathrm{#1}}}
\usepackage[T1]{fontenc}
\usepackage{amssymb}
\linespread{1.5}

\begin{document}

\title{Econometrics}

\author{$\mathcal{D}$avide Pietrobon$\ddagger$\footnote{$\ddagger$ 834949},
$\mathcal{A}$lberto Marcato$\natural$\footnote{$\natural$ 830265}}

\date{\today}

\maketitle

\section{Introduction}
We study the dynamics of the U.S. Energy Consumer Price Index for All Urban Consumers
(Energy CPI-U) from 1957 to 2014. As we try to uncover the Data Generating Process of the
Energy CPI-U time series, we find evidence of existence of a unit root. Moreover, we discover
the presence of significant ARCH effects in the residuals of the estimated model. Having
assumed the innovation process follows a GARCH, we proceed with the estimation and
comparison of different model specifications. The best statistical model for the error terms we
are able to compute is a skewed-t GARCH(2,1). We conclude the paper with a simple
forecasting exercise.

\section{The Consumer Price Index for All Urban Consumers: Energy}

\subsection{The Consumer Price Index}
\numberwithin{equation}{section}
\numberwithin{figure}{section}
\numberwithin{table}{section}
\newtheorem{theorem}{Theorem}[section]

The Consumer Price Index (CPI) is a measure of the average change over time in the prices of
consumer commodities (goods and services that are purchased and consumed by the
household sector). The index provides an estimate of the average price variation between any
two dates. The CPI follows the prices of a sample of commodities in various categories of
consumer expenditure (such as food, clothing, housing, and medical services): Movements in
the CPI derives from weighted averages of changes in the prices of the commodities in its
bundle. A sample commodity price change is the ratio of its price at the current date to its
price in a previous date, and a sample commodity weight is the share of consumer expenditure
that it represents (the algebraic formulas used for this averaging are called index number
formulas). The CPI is characterized by monthly frequency. Each month index value displays the
average change in the prices of consumer commodities since a base date, which is 1982-1984
for most indexes. For example, the CPI for March 2002 is 178.8. One interpretation of this is
that a representative bundle of consumer commodities that costs \$100.00 in 1982-1984
would have cost \$178.80 in March 2002.


The CPI for All Urban Consumers (CPI-U) refers to the purchases of the residents of urban and
metropolitan areas in the United States. The Energy CPI-U is a special aggregate index which is
based on a number of consumer commodities such as fuel oil, propane, kerosene, firewood,
gas, and electricity.

\subsection{The Energy CPI-U Time Series: Descriptive Analysis}

This subsection is concerned with the analysis and description of the Energy CPI-U time series.
Descriptive analysis of a time series is a fundamental procedure for the uncovering of the
characteristics of its Data Generating Process (DGP).


The data we work with come from the Bureau of Labor Statistics, U.S. Department of Labor.
Frequency is monthly and data have been seasonally adjusted. The data set ranges from
1:1957 to 3:2014 (687 observations). Here we report the time series graph, the histogram, and
the classical descriptive statistics associated to the Energy CPI-U time series (for the sake of
simplicity, we will refer to this series as $ \left \{ X_t \right \}_{t \in \tau}$ or $X_t$).

\begin{figure} [h]
\caption{Energy CPI-U: Time Series Graph}
\centering
\includegraphics[width=0.6\textwidth]{time_series_energy_cpi-u.pdf}
\end{figure}

\begin{figure} [h]
\caption{Energy CPI-U: Histogram}
\centering
\includegraphics[width=0.6\textwidth]{histogram_energy_cpi-u.pdf}
\end{figure}

\begin{table} [h]
\caption{Energy CPI-U: Descriptive Statistics}
\centering
\begin{tabular}{|l|r|}
\hline\hline
Mean & 93.958 \\
Median & 97.300 \\
Min & 21.300 \\
Max & 271.15 \\
Standard Deviation & 66.976 \\
Skewness & 0.82962 \\
Kurtosis & -0.12052 \\
$5^{th}$ Percentile & 22.200 \\
$95^{th}$ Percentile & 242.83 \\
\hline\hline
\end{tabular}
\end{table}

We choose to begin with the examination of the empirical distribution characterizing the data
(Figure 2.2). It is quite clear that the data have been generated by a highly non-Gaussian DGP.
In order to give statistical support to our intuition we rely on the famous Jarque-Bera (JB)
normality test. One of the most intuitive way to test for normality is to compare the sample
skewness and kurtosis with their respective theoretical values under the assumption of
normality (0 and 3, respectively). The following simple tests can be built

\begin{subequations}
\begin{equation}
\begin{split}
Skewness \; Test: Z_s&=\sqrt{N}\frac{S}{\sqrt{6}}\\
Kurtosis \; Test: K_s&=\sqrt{N}\frac{K-3}{\sqrt{24}}
\end{split}
\end{equation}

where $S$ is the sample skewness, $K$ is the sample kurtosis and $N$ is the sample size. Both
test statistics are distributed as standard normal under the null hypothesis of normality. The JB
test statistic is nothing more than the sum of the squares of the previous two test statistics,
that is,

\begin{equation}
JB=N \left( \frac{S^2}{6}+\frac{(K-3)^2}{24} \right)
\end{equation}
\end{subequations}

which is distributed as a chi-squared with two degrees of freedom\footnote{Let
$Z_1,\dots,Z_n$ be a sequence of independent standard normal random variables. Then, by
definition, the sum of their squares is distributed according to the chi-squared distribution
with $n$ degrees of freedom}. By construction, when the empirical distribution is normal the
JB test statistic equals zero. Intuitively, the more the JB test statistic is far from $0$, the higher
is the probability that the empirical distribution is not normal. The JB test gives the following
results\\

\begin{itemize}
\item Jarque-Bera test statistic: 79.2227, p-value: 6.26626e-018\\
\end{itemize}

Thus, we reject the null hypothesis of normality. In particular, we can see from Figure 2.2 that
the empirical distribution is platykurtic and right skewed\footnote{Platykurtosis or negative
excess kurtosis is associated to distributions that are relatively flat. Positive skewness indicates
that the data are skewed right, i.e., that the right tail of the distribution is long relative to the
left tail}. Moreover the empirical distribution is characterized by multimodality.


We now turn to the examination of the time series graph (Figure 2.1). It is quite clear that the
data we observe constitute a trajectory (or path) of a nonstationary stochastic process. In
order to give statistical support to our intuition we rely on the Augmented Dickey-Fuller (ADF)
test, which is one of the most used unit root tests in econometrics. The simplest approach to
testing for the presence of unit roots begins with the specification of an autoregressive model
of order one (AR(1)). In our case, such a model can be expressed as follows

\begin{subequations}
\begin{equation}
X_t=\theta+\alpha X_{t-1}+\varepsilon_t
\end{equation}

where $\theta,\alpha \in \textbf{R}$ and $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is a white
noise process. Consider the following hypothesis

\begin{equation}
\begin{split}
H_0&: |\alpha|=1 \Leftrightarrow \ X_t \sim I(1)\\
H_1&: |\alpha|<1 \Rightarrow \ X_t \sim I(0)
\end{split}
\end{equation}

where $I(i)$ denotes an integrated process\footnote{Consider a stochastic process $\left
\{ {X_t} \right \}_{t \in \tau}$, where $X_t$ is a random variable defined on a sample space
$\Omega$ (where $(\Omega,\mathcal{A},P)$ is a probability space) taking values on a set
$S$ (where $(S,\Sigma)$ is a measurable space), $\forall t \in \tau$, and $\tau$ is a totally
ordered set. Let $X_t$ be an AR($p$) process. Then, we have $\alpha(L)X_t=\varepsilon_t$,
where $\alpha(L) \equiv (1-\alpha_1 L-\alpha_2 L^2-\dots-\alpha_p L^p)$, $\alpha_i \in
\textbf{R}$, $\forall i \in \left \{ 1,2,\dots,p \right \}$, $L$ denotes the lag operator, and $\left
\{ \varepsilon_t \right \}_{t \in \tau}$ is a white noise process. An AR($p$) process is
covariance-stationary if and only if the roots of the characteristic polynomial in $L$ lie outside
the unit circle in $\textbf{C}$. An AR($p$) process is said to be integrated of order $p$ if $L
\equiv z=1$ is a root of multiplicity $m$ of the characteristic polynomial in $L$ (that is,
$\alpha(L)=(1-L)^m \phi(L)$, where $\phi(L)$ is a polynomial of degree $p-m$ such that
$\phi(1)\ne0$).} of order $i$, $i=1,0$. The Dickey-Fuller (DF) test is meant to test hypothesis
(2.2b). Recall that being an I(1) process is a sufficient but not necessary condition for being a
nonstationary process (e.g., if $|\alpha|>1$, the root of the characteristic polynomial in L lies
inside the unit circle in the complex plane, i.e., the process is covariance-stationary), while
being an I(0) process is a necessary but not sufficient condition for being a stationary process
(e.g., if $|\alpha|>1$ the process has no unit roots but it is nonstationary). However, note that
the hypothesis (2.2b) are built in such a way that if $|\alpha|>1$ both the null and the
alternative must be rejected. That is, the DF test is always able to tell if an AR(1) process is
covariance-stationary or not (if the null is rejected and the alternative is accepted the process
is covariance-stationary, if the null is accepted or both the null and the alternative are rejected
the process is nonstationary). The DF test statistic is given by

\begin{equation}
t_{\alpha=1}=\frac{\widehat{\alpha_{OLS}}-1}{s.e.(\widehat{\alpha_{OLS}})}
\end{equation}

where $\widehat{\alpha_{OLS}}$ denotes the Ordinary Least Square (OLS) estimate of
$\alpha$ and $s.e.(\widehat{\alpha_{OLS}})$ denotes the standard error of the OLS estimate
of $\alpha$. It can be shown that under the null ($\alpha=1$) $\widehat{\alpha_{OLS}}$ is not
asymptotically normally distributed and $t_{\alpha=1}$ is not asymptotically distributed
according to a standard normal distribution: The limiting distribution of the DF test statistic
$t_{\alpha=1}$ under the null does not have a closed form representation, and it is usually
referred to as the DF distribution. The DF test needs to be augmented if an AR(1) model is not
enough. Indeed, many economic time series have a complicated dynamic structure, which
cannot be described by means of a simple AR(1) model. For example, if there is serial
correlation in the error terms in equation (2.2a), the true process is probably not an AR(1). For
these reasons, when a statistician (or client) wants to test the presence of unit roots in a time
series, it is customary to rely on the ADF test. The latter accommodates general AR($p$)
processes

\begin{equation}
X_t=\theta+\sum_{j=1}^{p}\alpha_jX_{t-j}+\varepsilon_t
\end{equation}

where $\theta,\alpha_i \in \textbf{R}$, $\forall i \in \left \{ 1,2,\dots,p \right \}$, and $\left
\{ \varepsilon_t \right \}_{t \in \tau}$ is a white noise process. If the characteristic polynomial
(of order $p$) in $L$, $\alpha_p(L) \equiv (1-\alpha_1 L-\alpha_2 L^2-\dots-\alpha_p L^p)$,
has a unique unit root we have $\alpha_p(L)=(1-L)\psi_{p-1}(L)$, where $\psi_{p-1}(L)$ is a
polynomial of order $p-1$ such that $\psi_{p-1}(1) \ne 0 $\footnote{Let $P(x)$ be a degree
$n$ polynomial. $a$ is a root of $P(x)$ if and only if there exist a degree $n-1$ polynomial
$Q(x)$ such that $P(x)=(x-a)Q(x)$}. Thus, equation (2.2d) can be rearranged in the following
manner

\begin{equation}
\psi_{p-1}\Delta Xt=\theta+\varepsilon_t
\end{equation}

where $\Delta \equiv (1-L)$ is the difference operator. Note that $\alpha_1(L)=(1-
L)\psi_0(L)$ and $\psi_0(L)=1$, since $\psi_{p-1}(L)=1-\psi_1L-\dots-\psi_{p-1}L^p$. Thus,
$\alpha_1(L) \equiv (1-\alpha_1 L) =(1-L)$ and the previous equation can be rearranged in the
following manner

\begin{equation}
X_t=\alpha_1X_{t-1}+\sum_{j'=1}^{p-1}\Delta X_{t-j'}+\theta+\varepsilon_t
\end{equation}

or, equivalently

\begin{equation}
\Delta X_t=\gamma X_{t-1}+\sum_{j'=1}^{p-1}\Delta X_{t-j'}+\theta+\varepsilon_t
\end{equation}
\end{subequations}

where $\gamma \equiv \alpha_1-1$. Hence, testing for a unit root in model (2.2d) is
equivalent to testing for $\gamma=0$ in the previous model (since we assumed that $\psi_{p-
1}(L)$ has roots lying outside the unit circle). Again, the limiting distribution of the relevant
test statistic does not admit a closed form representation. Such distribution is usually referred
to as the ADF distribution. Finally, a fundamental practical issue for the implementation of the
ADF test is the choice of the number of lags $p$. Intuitively, if $p$ is too small there will be
serial correlation in the error terms of model (2.2d) which will bias the test. On the other hand,
if $p$ is too large the power of the test will suffer. We choose to specify 12 lags (in doing so,
we followed Gretl's suggestion). The ADF test gives the following results\\

\begin{itemize}
\item ADF test statistic: 1.26483, asymptotic p-value: 0.9986\\
\end{itemize}

The null hypothesis is rejected when the ADF test statistic is lower than a critical value which
depends on the sample size and the specification of the model (with or without trend). The
critical value associated to a model without trend when the sample size is higher than 500 is
approximately equal to -3.43. Thus, the null is accepted. We conclude that the data are a
trajectory of a nonstationary stochastic process having at least one unit root. Recall that the
ADF test does not discriminate between the presence of one or more unit roots. In order to
check whether the multiplicity of the unit root is greater than one we repeat the test on the
first difference of the Energy CPI-U time series\footnote{Indeed, let $ \left \{ X_t \right \}_{t \in
\tau}$ be an $I(1)$ process. Then, $ \left \{ \Delta X_t \right \}_{t \in \tau}$ is an $I(0)$.}.
Following this method, we can easily verify if there exists another unit root or not. If the
presence of unit roots in $\left \{ \Delta X_t \right \}_{t \in \tau}$ is accepted we may continue
with $\left \{ \Delta^2 X_t \right \}_{t \in \tau}$ and rely again on the services of the ADF test.
This procedure can be repeated until the test does verify the absence of unit roots. When
applied to $\left \{ \Delta X_t \right \}_{t \in \tau}$ the ADF test gives the following results
(again, we choose to specify 12 lags)

\begin{itemize}
\item ADF test statistic: -7.66108, asymptotic p-value: 4.79e-012\\
\end{itemize}

The null is rejected ($-7.66<-3.43$). Thus, we conclude that $\left \{ \Delta X_t \right \}_{t \in
\tau}$ is nonintegrated ($I(0)$). This implies that $\left \{ X_t \right \}_{t \in \tau}$ is a
trajectory of an $I(1)$ process (e.g., a random walk, a random walk with drift, or an
AutoRegressive Integrated Moving Average (ARIMA) process). To be honest though, we should
point out that the rejection of the null hypothesis $H_0: X_t \sim I(1)$ does not imply the
acceptance of the alternative $H_0: X_t \sim I(0)$. Nevertheless, consider the time series
graph of $\left \{ \Delta X_t \right \}_{t \in \tau}$

\begin{figure} [h]
\caption{First Difference of Energy CPI-U: Time Series Graph}
\centering
\includegraphics[width=0.6\textwidth]{time_series_first_diff_energy_cpi-u.pdf}
\end{figure}

The data are clearly fluctuating about a constant (zero) trend. Visual inspection strongly
suggests that $\left \{ \Delta X_t \right \}_{t \in \tau}$ is covariance-stationary.

\subsection{The Energy CPI-U Time Series: Data Generating Process}

This subsection is concerned with the task of finding a proper statistical model for the Energy
CPI-U time series (intuitively, a statistical model or Data Generating Process (DGP) is a formal
description of the existing relations between a set of random variables and a set of other
variables). In subsection 2.1 we concluded that the Energy CPI-U time series is likely to be a
trajectory of an $I(1)$ process. This conclusion was drawn from the results of the ADF test, as
applied to the Energy CPI-U time series (the null hypothesis of presence of a unit root was
accepted) and to the first difference of the Energy CPI-U time series (the alternative hypothesis
of covariance-stationarity was accepted). There are several stochastic processes which can
easily accomodate these stylized facts. Hereafter we give a brief description of three different
stochastic processes that are easy to handle and that could prove to be proper statistical
models for the Energy CPI-U time series.


We begin with a random walk, which can be conceived as the simplest $I(1)$ process. Let
$ \left \{ Y_t \right \}_{t \in \tau}$ be a stochastic process. Let $\tau \subseteq \textbf{Z}$.
Then, $ \left \{ Y_t \right \}_{t \in \tau}$ is a random walk if $Y_t$ can be expressed in the
following manner

\begin{equation}
Y_t=Y_{t-1}+\varepsilon_t
\end{equation}

$\forall t \in \tau$, where $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is a white noise process.
A random walk process is an AR(1) process without intercept such that $\alpha=1$. It is easy to
verify that a random walk is not stationary: In particular, the variance of a random walk does
depend on the time index $t$. More importantly, note that the rational expectation of the
value of $Y$ tomorrow ($\mathrm{E_t}[Y_{t+1}]$) is the value of $Y$ today ($Y_t$). In other
words, when dealing with a random walk process it is impossible to forecast how $Y$ changes
from date $t$ to date $t+1$, $\forall t \in \tau$ (i.e., $\Delta Y_t$ is random). That is, if the
Energy CPI-U time series were effectively a trajectory of a random walk process, we could
conclude that the variation in the average change over time in the consumer price of energy is
unpredictable.


The second $I(1)$ process we choose to present is a random walk with drift. $ \left \{ Y_t \right
\}_{t \in \tau}$ is a random walk with drift if $Y_t$ can be expressed in the following manner

\begin{equation}
Y_t=\mu+Y_{t-1}+\varepsilon_t
\end{equation}

where $\mu \in S$ is a constant, where $S$ is the domain of $Y_t$, $\forall t \in \tau$. Note
that the first difference of a random walk process (with or without drift) is a covariance-
stationary and weakly dependent (and, consequently, an $I(0)$) stochastic process.


The third and last $I(1)$ process we choose to present is an ARIMA($p,1,q$) process. Consider
the following ARMA($p,q$) process

\begin{subequations}
\begin{equation}
\alpha_p(L)Y_t=\xi_q(L)\varepsilon_t
\end{equation}

where $\alpha_p(L)Y_t \equiv 1-\alpha_1L-\dots-\alpha_pL^p$, $\alpha_i \in \textbf{R}$,
$\forall i \in \left \{ 1,2,\dots,p \right \} $, $\xi_q(L)Y_t \equiv 1+\xi_1L+\dots+\xi_qL^q$,
$\xi_{i'} \in \textbf{R}$, $\forall i' \in \left \{ 1,2,\dots,q \right \} $, and $ \left \{ \varepsilon_t
\right \}_{t \in \tau}$ is a white noise process. If $L \equiv z=1$ is a root of multiplicity $d$ of
the AR characteristic polynomial in L we say that the previous ARMA($p,q$) process is an
ARIMA($p,d,q$) process. Since the AR($p$) characteristic polynomial is a degree
$p$ polynomial, such polynomial has $p$ roots\footnote{By the Fundamental Theorem of
Algebra, every nonzero, single-variable, degree n polynomial with complex coefficients has,
counted with multiplicity, exactly n roots}. Since the AR characteristic polynomial has a unit
root of multiplicity $d$, it can be factorized in the following manner

\begin{equation}
\alpha_p(L) \equiv 1-\sum_{j=1}^{p}\alpha_jL^j=\Delta^d \psi_{p-d}(L)
\end{equation}

where $\psi(L)$ is a polynomial of degree $p-d$ such that $\phi(1)\ne0$. In case of
ARIMA($p,1,q$) process $\alpha_p(L)=\Delta \psi_{p-1}(L)$. Thus, the ARIMA($p,1,q$) process
can be expressed in the following manner

\begin{equation}
\psi_{p-1}(L)\Delta Y_t=\xi_q(L)\varepsilon_t
\end{equation}
\end{subequations}

Which of the three aforementioned processes is a better statistical model for the Energy CPI-U
time series? In order to answer the previous question, we need to look at the sample
AutoCorrelation Function (ACF) and the sample Partial AutoCorrelation Function (PACF)
characterizing the Energy CPI-U time series. Hereafter, we briefly describe ACF and PACF. The
autocovariance function is given by

\begin{subequations}
\begin{equation}
\gamma_{t,k} \equiv \mathrm{Cov}[Y_t,Y_{t-k}] \equiv \mathrm{E}[(Y_t-\mu_t)(Y_{t-k}-
\mu_{t-k})]
\end{equation}

where $\mu_t \equiv \mathrm{E}[Y_t]$, $\forall t \in \tau$, and $\tau \subseteq \textbf{Z}$.
By definition, equation (2.6a) can be expressed in the following manner

\begin{equation}
\gamma_{t,k} \equiv \int \dots \int f(Y_t,\dots,Y_{t-k})(Y_t-\mu_t)(Y_{t-k}-\mu_{t-k}) \,
\mathrm{d}Y_t\, \dots\, \mathrm{d}Y_{t-k}
\end{equation}
\end{subequations}

where $f(Y_t,\dots,Y_{t-k})$ is the joint probability density function (PDF) of random variables
$Y_t,\dots,Y_{t-k}$. The partial autocovariance function is given by

\begin{subequations}
\begin{equation}
\tilde{\gamma}_{t,k} \equiv \int\int f(Y_t,Y_{t-k})(Y_t-\mu_t)(Y_{t-k}-\mu_{t-k}) \,
\mathrm{d}Y_t\, \mathrm{d}Y_{t-k}
\end{equation}

where

\begin{equation}
f(Y_t,Y_{t-k})=\int \dots \int f(Y_t,\dots,Y_{t-k}) \, \mathrm{d}Y_{t-1}\, \dots\, \mathrm{d}Y_{t-
k+1}
\end{equation}
\end{subequations}

That is, the partial autocovariance between $Y_t$ and $Y_{t-k}$ is the autocovariance between
$Y_t$ and $Y_{t-k}$ with the dependence of $Y_t$ through to $Y_{t-
k+1}$ removed\footnote{In other words, the partial autocovariance between $Y_t$ and $Y_{t-
k}$ is the autocovariance between $Y_t$ and $Y_{t-k}$ conditional on the in-between values of
the time series $Y_{t-1},\dots,Y_{t-k+1}$}. The ACF is given by

\begin{equation}
\rho_{t,k} \equiv \frac{\gamma_{t,k}}{\sqrt{\gamma_{t,0}}\sqrt{\gamma_{t-k,0}}}
\end{equation}

and the PACF is given by

\begin{equation}
\tilde{\rho}_{t,k} \equiv
\frac{\tilde{\gamma}_{t,k}}{\sqrt{\tilde{\gamma}_{t,0}}\sqrt{\tilde{\gamma}_{t-k,0}}}
\end{equation}

Here we report the sample correlogram and partial correlogram associated to the Energy CPI-
U time series

\begin{figure} [h]
\caption{Energy CPI-U: Correlogram and Partial Correlogram}
\centering
\includegraphics[width=0.6\textwidth]{DDDD.jpg}
\end{figure}

The blue lines to both sides of the zero axis enable the client to assess which autocorrelation
coefficients are significant. Visual inspection suggests that the Energy CPI-U time series is
extremely persistent. Indeed, the sample autocorrelation is almost constant. On the other
hand, the sample partial autocorrelation has only one significant value at the first lag, if we
exclude the sample partial autocorrelation at the second lag, which is significant only by a
hair's breadth. The joint interpretation of the sample ACF and PACF is consistent with the
results given by the ADF test (see subsection 2.2). Indeed, the value of the sample
autocorrelation and partial autocorrelation at the first lag (0.9940) is also the estimate of
$\alpha_1$ in equation (2.2d)\footnote{Let $X_t=\theta+\alpha_1X_{t-1}+\dots+\alpha_pX_{t-
p}+\varepsilon_t$, where $\theta,\alpha_i \in \textbf{R}$, $\forall i \in \left \{ 1,2,\dots,p
\right \}$, and $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is a white noise process. Assume the
roots of the characteristic polynomial in L lie outside the unit circle in \textbf{C}. Then, the
process is covariance-stationary and admits a representation as a MA($\infty$) process.
$\mathrm{Cov}[X_t,X_{t-1}]=\sum_{j=1}^p\mathrm{Cov}[\alpha_jX_{t-j},X_{t-
1}]+\mathrm{Cov}[\varepsilon_t,X_{t-1}]=\alpha_j\mathrm{V[X_{t-1}]}$. Thus, $\rho_{t,1}
\equiv \alpha_1 V[X_{t-1}]/(\sqrt{V[X_{t}]}\sqrt{V[X_{t-1}]})=\alpha_1 \equiv \rho_1$ (since, by
assumption, the process is covariance-stationary, which implies $V[X_t]=V[X_{t-1}]$)}. In the
same manner, it is easy to prove that the sample partial autocorrelation (but not the sample
autocorrelation) at the lag $n$ with $\alpha_n$ in equation (2.2d), $\forall n \in \left
\{2,3,\dots,p \right \}$. Thus, the analysis of the sample partial autocovariance associated to
the Energy CPI-U time series suggest a statistical model such as

\begin{equation}
X_t=\theta+X_{t-1}+\sum_{j'=1}^{q}\xi_{j'}\varepsilon_{t-j'}+\varepsilon_t
\end{equation}

which is an ARIMA($1,1,q$) process. Equation (2.10) is conceived as the DGP associated to the
Energy CPI-U time series.

\subsection{The Energy CPI-U Time Series: Model Selection}

This subsection is concerned with the estimation of the DGP associated to the Energy CPI-U
time series (equation (2.10)). As it is clear, there are different potentially (i.e., \textit{a priopi})
proper statistical models for the Energy CPI-U time series.


One natural choice is to begin with the estimation of the random walk with drift model
(equation (2.4)). This choice is regarded as natural because the random walk model is the
simplest potentially proper (that is, $I(1)$) statistical model for the time series, and because
adding a drift makes the model more general without increasing its complexity. We choose to
estimate the random walk model by means of OLS estimators. It is important to understand
why choosing OLS in this case is a good idea. Indeed, it is well known that when applied to
time series data OLS estimators generally lose their appealing features. Nevertheless, it is
possible to prove that if the assumptions of linearity (the model to be estimated is linear), no
perfect collinearity (roughly speaking, no explanatory variable is constant, nor can be
expressed as a linear combination of all the other explanatory variables) and zero conditional
mean of the error terms ($\mathrm{E}[\varepsilon_t|X]=\mathrm{E}[\varepsilon_t]=0$, where
$X$ is the matrix of regressors) hold, then OLS estimators are unbiased. Moreover, if we
assume innovations are homoskedastic and uncorrelated we have the fundamental Gauss-
Markov result: The OLS estimators are the most efficient linear unbiased estimators. Finally,
when dealing with inference procedures, if the additional assumption of normally distributed
innovations is satisfied the OLS estimators are normally distributed and, under the null, each t-
statistic is distributed as a t-distribution, and each F-statistic is distributed as a F-distribution.



As it is clear, if some of the previous assumptions do not hold the outcomes of OLS estimation
are generally subjected to severe problems. Unfortunately, these assumptions are not satisfied
by many economic time series. Nevertheless, there are other theorems that come useful in
these cases and provide additional justification for the use of OLS estimators. In particular,
assume linearity holds and the stochastic processes which generated the dependent and the
independent time series are covariance-stationary and weakly dependent. Then, OLS
estimators are consistent\footnote{Note that this theorem has nothing to say about the
unbiasedness of the estimators.}. Moreover, if innovations are spherical (homoskedastic and
nonautocorrelated) OLS estimators are asymptotically normally distributed and the usual t-
statistics and F-statistics are asymptotically valid. This theorem provides additional justification
for use of OLS estimators: Even if the classical assumptions do not hold, there exist sufficient
conditions to have consistency of OLS and validity of the usual inference procedures.


Finally, we turn to the case of interest to our analysis: The use of highly persistent time series
in regression analysis. A time series is said to be persistent if it is generated by a stochastic
process which is not characterized by covariance-stationarity and weak dependence. It is well
known that when applied to highly persistent time series OLS estimators are generally
inconsistent and usual inference procedures are invalid, and the danger of incurring in
spurious (or nonsense) regression emerges. Nevertheless, there are some important results
which, again, provide justification for the use of OLS estimators.

\begin{theorem}
Let $\left \{ Y_t \right \}_{t \in \tau}$ and $\left \{ W_t \right \}_{t \in \tau}$ be two
cointegrated stochastic processes (i.e., there exist a linear combination of $\left \{ Y_t \right
\}_{t \in \tau}$ and $\left \{ W_t \right \}_{t \in \tau}$ which is covariance-stationary).
Consider the regression model $Y_t=\theta+\alpha X_t+\varepsilon_t$, where $\theta,\alpha
\in \textbf{R}$, and $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is a white noise process. Then,
the OLS estimators of $\theta$ and $\alpha$ are consistent.
\end{theorem}

\begin{theorem}
Let $\left \{ Y_t \right \}_{t \in \tau}$ and $\left \{ W_t \right \}_{t \in \tau}$ be two
cointegrated stochastic processes (i.e., there exist a linear combination of $\left \{ Y_t \right
\}_{t \in \tau}$ and $\left \{ W_t \right \}_{t \in \tau}$ which is covariance-stationary).
Consider the regression model $Y_t=\theta+\alpha W_t+\varepsilon_t$, where $\theta,\alpha
\in \textbf{R}$, and $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is a white noise process. Then,
the OLS estimators of $\theta$ and $\alpha$ are superconsistent (that is, they converges at a
faster rate than normal to the real value of the parameters).
\end{theorem}

The previous two theorems are due to Stock (1987). Given these results, we are justified to
rely on OLS estimation. Indeed, assume the statistical model for the Energy CPI-U time series is
a random walk with drift. If this assumption holds true we know there exists a linear
combination of $\left \{ X_t \right \}_{t \in \tau}$ and $\left \{ X_{t-1} \right \}_{t \in
\tau}$ which is $I(0)$. Let $Z_t \equiv X_{t}-X_{t-1}$. As we have seen is subsection 2.2, $\left
\{ \Delta X_t \right \}_{t \in \tau}$ is $I(0)$. Thus, $\left \{ Z_t \right \}_{t \in \tau}$ is a linear
combination of $\left \{ X_t \right \}_{t \in \tau}$ and $\left \{ X_{t-1} \right \}_{t \in
\tau}$ which is $I(0)$. Not only. Note that

\begin{equation}
Z_t\equiv \theta+X_{t-1}-X_{t-1}+\varepsilon_t=\theta+\varepsilon_t
\end{equation}

which is trivially covariance-stationary (and weakly dependent) as long as $\left
\{ \varepsilon_{t} \right \}_{t \in \tau}$ is a white noise process. In other words, let the Energy
CPI-U time series be a trajectory of a random walk with drift process. Then, $Z_t$ is a linear
combination of $\left \{ X_t \right \}_{t \in \tau}$ and $\left \{ X_{t-1} \right \}_{t \in
\tau}$ which is covariance-stationary. Thus, we can invoke theorems 2.1 and 2.2 when
estimating regression model (2.2a).


As we have seen, assuming the statistical model for the Energy CPI-U time series is a random
walk with drift process implies $\Delta X_t=\theta+\varepsilon_t$, $\forall t \in \tau$. In this
case, figure 2.3 not only represents the first difference of the Energy CPI-U time series but it
also depicts a trajectory of the innovation process $\left \{ \varepsilon_{t} \right \}_{t \in
\tau}$, up to a real constant $\theta$. As we have seen (see subection 2.2), the ADF test in fact
intimates that $\left \{ \Delta X_{t} \right \}_{t \in \tau}$ is generated by an I(0) process.
Moreover, visual inspection of figure 2.3 strongly suggests that $\left \{ \Delta X_t \right \}_{t
\in \tau}$ is covariance-stationary. However, it is clear that the volatility of $\left \{ \Delta
X_{t} \right \}_{t \in \tau}$ is not constant. On the contrary, the variance of $\left \{ \Delta
X_{t} \right \}_{t \in \tau}$ is increasing over time. This stylized fact goes against the
assumption that the Energy CPI-U time series is a trajectory of a simple random walk with drift
process and suggests that an AutoRegressive Conditional Heteroskedastic (ARCH) or a
Generalized AutoRegressive Conditional Heteroskedastic (GARCH) process may be a better
statistical model for the innovation process in (2.2a)\footnote{see subsection 2.5.}.


Here we report the output of the OLS regression (2.2a) and the associated scatter plot

\begin{table} [h]
\caption{OLS estimation of random walk with drift model}
\centering
\begin{tabular}{|c|c|c|c|c|c|}
\hline\hline
& Coefficient & s.e. & t-ratio & p-value & \\
\hline
Constant & 0.178671 & 0.227455 & 0.7855 & 0.43242 & \\
$X_{t-1}$ & 1.00159 & 0.00197686 & 506.6563 & $<0.00001$ & *** \\
\hline\hline
\end{tabular}
\begin{tabular}{|c c|c c|}
Mean dep. var. & 94.06368 & St. Dev. dep. var. & 66.96747\\
$\mathrm{RSS}$ & 8163.791 & s.e. regression & 3.454760\\
$\mathrm{R}^2$ & 0.997342 & Adjusted $\mathrm{R}^2$ & 0.997339\\
$\mathrm{F}(1,684)$ & 256700.6 & p-value ($\mathrm{F}$) & 0,000000\\
$\log$-likelihood & -1822.861 & Akaike info. criterion & 3649.722\\
Schwarz criterion & 3658.784 & Hannan-Quinn & 3653.228\\
$\rho$ & 0.409420 & Durbin-Watson stat. & 1,18114\\
\hline\hline
\end{tabular}
\end{table}

\begin{figure} [h]
\caption{Energy CPI-U$_t$ vs Energy CPI-U$_{t-1}$, OLS estimation}
\centering
\includegraphics[width=0.6\textwidth]{ols_random_walk.pdf}
\end{figure}

First of all, we need to analyze the residuals of the regression. Concentrating first on residual
analysis is fundamental, since many of the properties of OLS estimators depend on the
characteristics of the residuals.


One of the problems that emerge with the estimation of the simple random walk with drift
model lies on the serial correlation in the residuals. This can be easily seen by looking at the
value of the Durbin-Watson statistic. Intuitively, the Durbin-Watson test checks whether there
is serial correlation in the residuals associated to a regression model. More formally, the
Durbin-Watson test tests the null hypothesis that the residuals associated to an OLS regression
are not autocorrelated against the alternative that the residuals are generated by an AR(1)
process. Thus, the hypothesis associated to the Durbin-Watson test are

\begin{subequations}
\begin{equation}
\begin{split}
H_0&: \rho=0\\
H_1&: \rho=1
\end{split}
\end{equation}

where $\rho$ is the first order autocorrelation in the innovation process. The Durbin-Watson
test statistic is given by

\begin{equation}
d=\frac{\sum_{t=2}^{T}(e_t-e_{t-1})^2}{\sum_{t=2}^{T}e_t^2}
\end{equation}

where $T$ is the sample size, and $e_t=X_t-\hat{X_t}$ is the residual associated to observation
$t$, where $\hat{X_t}$ is the estimate of $X$ at date $t$. It can be shown that the value of
$d$ lies in the closed and bounded interval $[0,4]$ and that

\begin{equation}
d \approx 2(1-\hat{\rho})
\end{equation}

where $\hat\rho$ is the sample autocorrelation in the residuals at the first lag. Thus, if
$d=2$ there is (approximately) no sample autocorrelation at the first lag, while if
$d<2$ ($d>2$) there is positive (negative) sample autocorrelation. As a last remark, recall that
the Durbin-Watson statistic is biased for models with lagged dependent variables, so that
autocorrelation in the residuals is generally understimated. In order to overcome this problem,
it is possible to computed the following asymptotically unbiased and normally distributed test
statistic

\begin{equation}
h=\left( \frac{2-d}{2} \right) \sqrt{\frac{T}{1-T\widehat{\mathrm{V}[\hat{\alpha}_1]}}}
\end{equation}
\end{subequations}

where $\widehat{\mathrm{V}[\hat{\alpha}_1]}$ is the sample variance associated to the
estimate of $\alpha_1$, where $\alpha_1$ is the coefficient associated to $X_{t-1}$. Note that
in Table 2.2 $d=1.18114$. Thus, there is evidence of positive serial correlation in the
innovation process. The existence of autocorrelation in the error terms probably indicates a
misspecification in the proposed model. Usually, the Durbin-Watson test is used to detect
autocorrelation in the residuals at a first inspection. Thus, if the client prefers a more rigorous
analysis of residual serial correlation it is appropriate to rely on different statistical tests, such
as the Breusch-Godfrey and Ljiung-Box tests.


With Breusch-Godfrey test we test the null of no serial correlation of any order up to $p$ in
the innovation process. This test assumes the error terms are generated by the following
AR($p$) model

\begin{subequations}
\begin{equation}
\varepsilon_t=\sum_{j=0}^{p}\rho_j\varepsilon_{t-j}+u_t
\end{equation}

where $\left \{ u_t \right \}_{t \in \tau}$ is a white noise process. The previous regression
model is estimated with OLS. Thus, we obtain a series of residuals $\left \{ \hat{u_t} \right \}_{t
\in \tau}$. The Breusch-Godfrey test statistic is given by

\begin{equation}
BG=(T-p)\mathrm{R}^{2}
\end{equation}
\end{subequations}

where $T$ is the sample size and $\mathrm{R}^{2}$ is the $\mathrm{R}^{2}$ associated to the
OLS estimation of model (2.13a). Breusch and Godfrey proved that when the null hypothesis
$H_0:\rho_1=\dots=\rho_p=0$ holds true $BG \sim \chi_p^2$. The Breusch-Godfrey test gives
the following results\\

\begin{itemize}
\item Breusch-Godfrey test statistic: 207.621957, p-value: 8.68e-038\\
\end{itemize}

where we choose $p=12$. The null hypothesis is not accepted. Thus, we conclude that
regression model (2.13a) with $p=12$ is jointly significant.


With Ljiung-Box test we test the null of no serial correlation of any order up to $p$ in the
innovation process, as in the previous case. The test statistic is given by

\begin{equation}
Q_p=T(T+2)\frac{\sum_{k=1}^{p}\hat{\rho}_k^2}{T-k}
\end{equation}

where $T$ is the sample size and $\hat{\rho}_k$ is sample autocorrelation at lag $k$. It can be
shown that under the null $Q_p \sim \chi_p^2$. The Ljiung-Box test gives the following
results\\

\begin{itemize}
\item Ljiung Box test statistic: 174.612, p-value: 5.43e-031\\
\end{itemize}

where we choose $p=12$. The null hypothesis is not accepted. Thus, Ljiung-Box test confirms
our previous result.


We conclude that the innovation process (and, consequently, $\left \{ \Delta X_{t} \right \}_{t
\in \tau}$) is characterized by serial correlation. This fact may indicate a misspecification in the
dynamics of model (2.2a).


Another problem that emerges with the estimation of the simple random walk with drift
model lies on the heteroskedasticity of the residuals. As we have seen, the first difference of
the Energy CPI-U time series seems to exhibit clustered variance. Intuitively, this means that
the variance of such time series is not time independent. In order to check whether our
impression is statistically true we rely on two statistical tests for heteroskedasticity: Breusch-
Pagan test and White tests.


With Breusch-Pagan test we test whether the estimated variance of the residuals associated to
a regression is dependent of the explanatory variables (regressors). Breusch-Pagan test is
extremely general, since it covers a wide array of situations of heteroskedasticity. Moreover, it
is an extremely simple test, since it is based on OLS residuals. Generally speaking, Breusch-
Pagan test begins with the following regression model

\begin{subequations}
\begin{equation}
\vect{y}=\vect{X\beta}+\vect{\varepsilon}
\end{equation}

where $\vect{y}$ and $\vect{\varepsilon}$ are vectors of the Euclidean $T$-space,
$\vect{\beta}'$ is a vector of the Euclidean $k$-space and $\vect{X}$ is a ($T \times k$)
Euclidean matrix. We assume that the error terms are independent and normally distributed
with variance $\sigma_t^2=h(\left \langle \vect{z}_t, \vect{\gamma} \right \rangle)$, where
$h:\textbf{R}^{2p} \rightarrow \textbf{R}$, $\vect{z}_t'=(1,z_{2,t},\dots,z_{p,t})$ is vector of
real explanatory variables that we think may influence the variance of the innovation process,
and $\vect{\gamma}$ is a vector of coefficients of the Euclidean $p$-space. Since the first
element of $\vect{z}_t$ is assumed to be equal to one, the null hypothesis of constant
variance is equivalent to $\gamma_2=\gamma_3=\dots=\gamma_p=0$. Indeed, in this case
$\sigma_t^2=h(\gamma_1)$, which is constant. Finally, note that some of (at most all) the
explanatory variables in model (2.15a) can be lagged dependent variables. The Breusch-Pagan
is performed in the following manner: First of all, one needs to regress $\vect{y}$ on
$\vect{X}$ by means of OLS. Secondly, given the vector of residuals associated to the
aforementioned regression, the following auxiliary regression model is specified

\begin{equation}
\hat{\varepsilon}_t^2=\gamma_0+\sum_{j=1}^p\gamma_jz_{j,t}+u_t
\end{equation}

where $\left \{ \hat{\varepsilon}_t \right \}_{t \in \tau}$ is the sequence of OLS residuals
associated to regression model (2.15a) and $\left \{ u_t \right \}_{t \in \tau}$ is a white noise
process. Obviously, the client needs to determine the variables in $\vect{z}_t$. However, the
functional form of $h$ need not be specified. Finally, one must compute the
$\mathrm{R}^2$ associated to auxiliary regression (2.15b). The Breusch-Pagan test statistic is
given by

\begin{equation}
BP=T\mathrm{R}^2
\end{equation}
\end{subequations}

where $T$ is the sample size and $\mathrm{R}^2$ is the $\mathrm{R}^2$ associated to
auxiliary regression (2.15b). It can be shown that under the null hypothesis of
homoskedasticity the test statistic is asymptotically distributed as a $\chi_{p-1}^2$. We choose
$p=1$ and $z_{1,t}=X_{t-1}$. The Breusch-Pagan test gives the following results\\

\begin{itemize}
\item Breusch-Pagan test statistic: 919.665115, p-value: 0.000000\\
\end{itemize}

We refuse the null hypothesis of homoskedasticity. Thus, we conclude that there is
heteroskedasticity in the residuals of regression model (2.2a).
We now turn to White test. Suppose we want to test the assumption of homoskedasticity in
the innovation process in model (2.2a). Assume we choose to rely on Breusch-Pagan test but
we do not have a clue on the nature of $\vect{z}_t$. That is, immagine that we do not know
which variables may influence the variance of the error terms. Then, we can assume
$\vect{z}'_t=(X_{t},X_{t}^2)$. With this choice of explanatory variables Breusch-Pagan test is
referred to as White test. The White test gives the following results\\

\begin{itemize}
\item White test statistic: 46.446290, p-value: 0.000000\\
\end{itemize}

The null hypothesis is not accepted. Thus, White test confirms our previous result.


We conclude that the innovation process (and, consequently, $\left \{ \Delta X_{t} \right \}_{t
\in \tau}$) is characterized by heteroskedasticity. This highlights the fact that the error terms
in model (2.2a) cannot be assumed to be a simple white noise process. In particular, as noted
before, an ARCH or a GARCH process may be a better statistical model for $\left
\{ \varepsilon_t \right \}_{t \in \tau}$.

\subsection{Autocorrelation and Heteroskedasticity in the Residuals}

As we have seen in subsection 2.4, the residuals associated to regression model (2.2a) are
characterized by both serial correlation and heteroskedasticity.


The presence of autocorrelation in regression residuals is likely to indicate a misspecification in
model (2.2a). In other words, the Energy CPI-U time series is probably not a trajectory of a
simple random walk process. However, as we have seen in subsection 2.3 the PACF of the
Energy CPI-U time series has a very large spike at lag 1 and no other significant spikes. This
indicates that an AR(1) process is likely to be a good statistical model for the time series.
Moreover, the estimated AR(1) coefficient (which is the height of the PACF at lag 1) turned out
to be statistically equal to 1. Hence, we do not need to add any additional AR term.
Consequently, we search for the source of residual autocorrelation in the moving average part
of ARIMA process (2.10). The ACF characterizing the differenced time series plays the same
role for MA terms that the PACF (characterizing the nondifferenced time series) plays for the
AR terms. In other words, suppose the autocorrelation of the first difference of the Energy CPI-
U time series is significant at lag $k$ but not at any higher lag. Then, an MA($k$) process is
likely to be a good statistical model for such series. Unfortunately, we find that also $\left
\{ \Delta X_t \right \}_{t \in \tau}$ is quite persistent. In particular, the autocorrelation of the
first difference of the Energy CPI-U time series is significant up to lag 112. Since we cannot
handle the computational complexity characterizing the estimation of an ARIMA(1,1,112) we
choose to aggregate the Energy CPI-U time series, passing from monthly to yearly frequency.
There are several ways to perform temporal aggregation of time series. Since we do not expect
certain months to have more or less economic weight than others, we choose to use the
arithmetic mean as temporal aggregator. Thus, $X_T \equiv \sum_{t(T)=1}^{12}X_t(T)$, for any
year $T$, where $t(T)$ indicates month $t$ in year $T$, $t(T) \in \left \{ 1,\dots,12 \right \}$.


By applying such a temporal aggregation scheme we obtain a new time series, which we
denote $\left \{ \Xi_T \right \}_{T \in \mathbb{T}}$. As one can imagine, $\left \{ \Xi_T \right
\}_{T \in \mathbb{T}}$ is much less persistent than $\left \{ X_t \right \}_{t \in \tau}$.
Hereafter, we report the sample autocorrelogram and partial autocorrelogram associated to
$\left \{ \Delta\Xi_T \right \}_{T \in \mathbb{T}}$

\begin{figure} [h]
\caption{First Difference of yearly Energy CPI-U: Correlogram and Partial Correlogram}
\centering
\includegraphics[width=0.6\textwidth]{correlogram.png}
\end{figure}

Visual inspection of Figure 2.6 suggests that $\left \{ \Delta\Xi_T \right \}_{T \in
\mathbb{T}}$ is not persistent at all. Indeed, the sample autocorrelation and partial
autocorrelation are not significant at any lag. Thus, when aggregating the Energy CPI-U time
series the need of adding MA terms to model (2.2a) vanishes. For the sake of completeness,
we estimate model (2.2a) by means of OLS. We choose not to report the regression output
since the results are almost identical to those of Table 2.2. In particular, the constant
($\theta$) is statistically equal to zero, while the coefficient associated to regressor $\Xi_{T-
1}$ ($\alpha$) is statistically equal to one. The difference between the two regression models
lies in residual autocorellation. Indeed, when estimating model (2.2a) using yearly data,
autocorrelation in the OLS residuals is ruled out. Thus, as we said before, we conclude that
specification (2.2a) is a good statistical model for the yearly Energy CPI-U time series. The
Durbin-Watson test ($d=2.07803$), the Breusch-Godfrey ($BG=0.130741$, $\mathrm{p\!-
\!value}=0.718$), and Ljiung-Box ($Q=0.129921$, $\mathrm{p\!-\!value}=0.719$) tests at the
first lag confirm our intuition.


We now turn to the problem of heteroskedasticity in the residuals. One of the classical
assumptions in time series econometrics is that the innovation process in a regression model is
characterized by time independent variance. However, empirical investigation demonstrates
that the phenomenon of volatility clustering is extremely common in economic time series. For
example, time dependent variance processes are commonly employed in modeling financial
time series. Moreover, in the analysis of macroeconomic time series data, Engle (1982, 1983)
and Cragg (1982) found evidence that the variance of innovations is often less stable than
what is usually assumed.


One can try to take into account volatility clustering by assuming the innovation process
follows an AR. For example, one could assume $\varepsilon_t=\zeta\varepsilon_{t-1}+u_t$,
where $\zeta \in \textbf{R}$ and $ \left \{ u_t \right \}_{t \in \tau}$ is a white noise process.
Assume $\zeta \in (0,1)$. Then, a positive value of $u_t$ will be followed by a positive value
$u_{t+1}$ with a probability greater than $1/2$. However, it turns out that this assumption is
inconsistent with respect to many of the stylized facts characterizing real world economic time
series. For example, in financial econometrics it is common to observe periods of relatively
high turbulence in which asset price volatility increases dramatically. Nevertheless, while
volatility is typically persistent (in the sense that a turbulent period tends to be followed by
another turbulent period), it is not generally true that a positive (that is, above expectation)
price today tends to be followed by a positive price tomorrow\footnote{An alternative view is
that an autoregressive behavior of asset prices would constitute a violation of the efficient
market hypothesis (which states that future prices cannot be predicted by past prices).}! Thus,
we need a different statistical model to explain heteroskedasticity in the innovation process.
Hereafter, we give a brief description of ARCH (Engle (1982)) and GARCH (Bollerslev (1986))
processes and of the procedures which are generally used to estimate them.


Let $\left \{ \varepsilon_t \right \}_{t \in \tau}$ be a stochastic process. Let $\tau \subseteq
\textbf{Z}$. $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is an ARCH($p$) if $\varepsilon_t$ can
be expressed in the following manner

\begin{subequations}
\begin{equation}
\begin{split}
\varepsilon_t&=\sqrt{h_t}u_t\\
h_t&\equiv \mathrm{V}[\epsilon_t|\mathcal{F}_{t-
1}]=\alpha_0+\sum_{j=1}^p\alpha_j\varepsilon_{t-j}^2
\end{split}
\end{equation}

where $\left \{ u_t \right \}_{t \in \tau}$ is a white noise process such that
$\mathrm{V}[u_t]=1$, $\mathcal{F}_{t-1}$ is the information set at date $t-1$, and $\alpha_i
\in \textbf{R}_{+}$, $\forall i \in \left \{ 0,1,\dots,p \right \}$. Note that

\begin{equation}
\mathrm{E}[\varepsilon_t|\mathcal{F}_{t-1}]=\sqrt{h_t}\mathrm{E}[u_t|\mathcal{F}_{t-1}]=0
\end{equation}

Moreover, since $u_t$ is stochastically independent of $\varepsilon_s$, $\forall t,s \in \tau$,
we have

\begin{equation}
\mathrm{E}[\varepsilon_t]=\mathrm{E}\left[\sqrt{h_t} \right]\mathrm{E}[u_t]=0
\end{equation}

as long as $\mathrm{E}[\sqrt{h_t}]<\infty$. Finally, note that\footnote{Let $X$ and $Y$ be two
independent random variables. Then, $f_{x,y}(x,y)=f_x(x)f_y(y)$, where $f_{\eta}(\eta)$ is the
probability density function associated to random variable $\eta$, $\eta=X,Y,\left \{ X,Y \right
\}$. $\mathrm{V}[XY] \equiv \begin{matrix} \iint f_{x,y}(x,y)(xy-\mathrm{E}[xy])^2 \,
\mathrm{d}x\, \mathrm{d}y \end{matrix}=\begin{matrix} \iint f_{x,y}(x,y)(xy-
\mathrm{E}[x]\mathrm{E}[y])^2 \, \mathrm{d}x\, \mathrm{d}y \end{matrix}$. Assume
$\mathrm{E}[y]=0$. Then, $\mathrm{V}[XY]=\begin{matrix} \iint f_{x,y}(x,y)(xy)^2 \,
\mathrm{d}x\, \mathrm{d}y \end{matrix}=\begin{matrix} \int f_x (x)x^2 \, \mathrm{d}x
\end{matrix} \begin{matrix}\int f_y (y)y^2 \, \mathrm{d}y \end{matrix} \equiv
\mathrm{E}[x^2]\mathrm{V}[y]$.}

\begin{subequations}
\begin{equation}
\mathrm{V}[\varepsilon_t]=\mathrm{E} \left[ \left( \sqrt{h_t} \right)^2
\right]\mathrm{V}[u_t]=\mathrm{E}[h_t]
\end{equation}

Assume $p=1$. Then, we have

\begin{equation}
\mathrm{V}[\varepsilon_t]=\alpha_0+\alpha_1 \mathrm{E}[\varepsilon_{t-
1}^2]=\alpha_0+\alpha_1 \mathrm{V}[\varepsilon_{t-1}]
\end{equation}

Thus, $\mathrm{V}[\varepsilon_t]=\alpha_0(1+\alpha_1+\alpha_1^2\dots+\alpha_1^{n-
1})+\alpha_1^n \mathrm{V}[\varepsilon_{t-n}]$. Assume the ARCH(1) process began infinitely
far in the past with a finite initial variance. Moreover, let $\alpha_1 \in [0,1)$. Then, we have

\begin{equation}
\lim_{n \rightarrow +\infty}\mathrm{V}[\varepsilon_t]=\frac{\alpha_0}{1-\alpha_1}
\end{equation}
\end{subequations}

That is, the stochastic process $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is unconditionally
homoskedastic. Moreover, note that

\begin{equation}
\begin{split}
\mathrm{Cov}[\varepsilon_t,\varepsilon_{t-k}]&=\mathrm{E}\left[ \varepsilon_t\varepsilon_{t-
k} \right]=\mathrm{E} \left[ \sqrt{h_t h_{t-k}}u_t u_{t-k} \right]\\
&=\mathrm{E} \left[ \sqrt{h_t h_{t-k}} \right]\mathrm{E}[u_t]\mathrm{E}[u_{t-k}]\\
&=0
\end{split}
\end{equation}

$\forall k \in \textbf{Z}$. That is, the unconditional stochastic process $\left \{ \varepsilon_t
\right \}_{t \in \tau}$ is a white noise. Finally, note that

\begin{equation}
\mathrm{Cov}[\varepsilon_t,\varepsilon_{t-k}|\mathcal{F}_{t-w}]=\mathrm{E}[\varepsilon_t
\varepsilon_{t-k}|\mathcal{F}_{t-w}]=\varepsilon_{t-
k}\mathrm{E}[\varepsilon_t|\mathcal{F}_{t-w}]=0
\end{equation}

$\forall k \ge w$, $\forall w \in \textbf{N}^*$. Because an ARCH model generally requires
iterative estimation procedures (see below), it may be desirable to test whether such a model
is appropriate before going to the effort to estimate it. It turns out that the Lagrange multiplier
test procedure is ideal for this, as in many similar cases. Consider a generic ARCH($p$) process.
Let $\alpha_1=\alpha_2=\dots=\alpha_p=0$ under the null hypothesis of no ARCH effects (in
this case the conditional variance is simply equal to $\alpha_0$, i.e., there is no conditional
heteroskedasticity). Consider the conditional variance function $h=h(\left \langle
\vect{z},\vect{\alpha} \right \rangle)$, where $h$ is some differentiable function,
$\vect{z}'=(1,\varepsilon_{t-1}^2,\dots,\varepsilon_{t-p}^2)$, and
$\vect{\alpha}'=(\alpha_0,\alpha_1,\dots,\alpha_p)$. Under the null $h$ is a constant. Then,
one gets the usual test statistic $T\mathrm{R}^2$ (which is asymptotically distributed as a chi-
squared whith $p$ degrees of freedom), where $T$ is the sample size and $\mathrm{R}^2$ is
the $\mathrm{R}^2$ of the regression of $\varepsilon_t^2$ on an intercept and $p$ lagged
values of $\varepsilon_t^2$. In practice, in order to test for the presence of ARCH($p$) effects
one can estimate the model under consideration by means of OLS, and consider the squared
residuals $\left \{ \hat{\varepsilon}_t^2 \right \}_{t \in \tau}$ associated to the regression.
Then, it is necessary to run the following auxiliary regression model

\begin{equation}
\hat{\varepsilon}_t^2=\alpha_0+\sum_{j=1}^p\alpha_j\hat{\varepsilon}_{t-j}^2+u_t
\end{equation}

Consequently, such model is estimated by means of OLS. Finally, one needs to compute the
test statistic $T\mathrm{R}^2$, which is asymptotically distributed as a chi-squared whith
$p$ degrees of freedom.


We now turn to the description of GARCH procesees. Despite their simplicity, ARCH processes
can effectively introduce the concept of autoregressive conditional heteroskedasticity in time
series analysis. Nevertheless, it often requires many variables to adequately explain the
volatility characterizing real world time series. An alternative model seems to be needed.
Bollerslev (1986) proposed a useful extension of ARCH models known as GARCH processes.

Let $\left \{ \varepsilon_t \right \}_{t \in \tau}$ be a stochastic process. Let $\tau \subseteq
\textbf{Z}$. $\left \{ \varepsilon_t \right \}_{t \in \tau}$ is an GARCH($p,q$) if
$\varepsilon_t$ can be expressed in the following manner
\begin{equation}
\begin{split}
\varepsilon_t &= \sqrt{h_t} u_t\\
h_t& \equiv \mathrm{V}[\varepsilon_t|\mathcal{F}_{t-1}] =\alpha_0+\sum_{j=1}^p \alpha_j
\varepsilon_{t-j}^2+\sum_{j'=1}^q \delta_{j'} h_{t-j'}
\end{split}
\end{equation}

where $\left \{ u_t \right \}_{t \in \tau}$ is a white noise process such that
$\mathrm{V}[u_t]=1$, $\mathcal{F}_{t-1}$ is the information set at date $t-1$, $\alpha_i \in
\textbf{R}_{+}$, $\forall i \in \left \{ 1,2,\dots,p \right \}$, and $\delta_i' \in \textbf{R}_{+}$,
$\forall i' \in \left \{ 0,1,\dots,q \right \}$. Note that when $p=0$, the GARCH($p,q$) process
degenerates to an ARCH($p$). The properties of GARCH processes are similar to those of ARCH
models (see above).


Hereafter, we give a brief description of the standard approach to parameter estimation in
ARCH and GARCH models. First of all, it is useful to recall some notions in the theory of
maximum likelihood estimation. Consider a sample $\left \{ x_i \right \}_{i=1}^n$ of
$n$ independent and identically distributed random variables drawn from a distribution $f(x)$,
where $x \in \mathcal{X}$ (where $\mathcal{X}$ denotes the sample space), and $\theta \in
\Theta$ (where $\Theta$ denotes the parameter space). The joint probability density function
associated to the sample is given by

\begin{subequations}
\begin{equation}
f \left( \vec{x} \, | \, \vec{\theta} \, \right)=\prod_{i=1}^n f \left( x_i \, | \, \vec{\theta} \, \right)
\end{equation}

where we assumed $\mathcal{X}$ and $\Theta$ are Euclidean spaces. Roughly speaking,
maximum likelihood estimation works in the following manner: One should take the
aforementioned joint probability density function and conceive the observations
$x_1,\dots,x_n$ as parameters, whereas the parameters $\vec{\theta}$ should be conceived
as variables. By looking at the previous function in this manner, one gets the so-called
likelihood function. Such function is usually denoted as follows

\begin{equation}
L \left( \vec{\theta} \, | \, \vec{x} \, \right) =\prod_{i=1}^n f \left(x_i \, | \, \vec{\theta} \,
\right)
\end{equation}
\end{subequations}

In practice, it is sometimes useful to work with the logarithm of the likelihood function. The
maximum likelihood principle states that one should find the values of the parameters
$\vec{\theta}^*$ that maximize the likelihood function, if such values exist. The idea is the
following: Given a certain probability density function and a certain sample, we look for the
values of the parameters which maximize the probability of drawing such sample from such
distribution. Assume $\left \{ u_t \right \}_{t \in \tau}$ in (2.16h) is Gaussian. Then,
$\varepsilon_t|F_{t-1} \sim \mathcal{N}(0,h_t)$ and
$\mathrm{Cov}[\varepsilon_t,\varepsilon_s|\mathcal{F}_{t-1}]=0$, $\forall t,s \in \tau$ such
that $t \ne s$. The probability density function associated to $\varepsilon_t$ conditional on
$\mathcal{F}_{t-1}$ is

\begin{equation}
f(\varepsilon_t|\mathcal{F}_{t-1})=\frac{1}{\sqrt{2 \pi h_t}}e^{-\frac{\varepsilon_t^2}{2h_t}}
\end{equation}

The log-likelihood function of the parameters
$\vec{\theta}=(\alpha_0,\dots,\alpha_p,\delta_1,\dots,\delta_q)'$ given the observations
$\left \{ \varepsilon_t \right \}_{t \in \tau}$ can be expressed in the following
manner\footnote{For the ease of notation, we choose to denote $\vec{\theta}$ simply as
$\theta$.}

\begin{equation}
\log L(\theta|\left \{ \varepsilon_t \right \}_{t \in \tau})=\sum_{t \in \tau} -\frac{1}{2}
\left( \pi+h_t+ \frac{\varepsilon_t^2}{h_t} \right)
\end{equation}

Thus, the log-likelihood maximization problem can be expressed as follows

\begin{equation}
\max_{(\vec{\alpha},\vec{\delta}) \, \in \, \textbf{R}^{p+1}\times\textbf{R}^q} \log L
\left( \theta|\left \{ \varepsilon_t \right \}_{t \in \tau} \right) =\sum_{t \in \tau} -\frac{1}{2}
\left( \pi+h_t+ \frac{\varepsilon_t^2}{h_t} \right)
\end{equation}

where $\vec{\alpha}=(\alpha_0,\dots,\alpha_p)$ and $\vec{\delta}=(\delta_1,\dots,\delta_q)$.
The estimates that maximize the conditional log-likelihood are called the maximum likelihood
(ML) estimates. Problem (2.16l) is usually solved by means of Newton (or Newton-Raphson)
optimization algorithm. Roughly speaking, Newton optimization algorithm is an iterative
method to find stationary points (i.e., first order necessary conditions for extrema, given that
the objective function is differentiable and the constraint set is open). The algorithm attempts
to construct a sequence $\theta_n$, $n=\left \{ 1,2,\dots \right \}$, in $\Theta$ from an initial
guess $\theta_0$ which converges to a stationary point $\theta^*$. First of all, consider the
following second order polynomial approximation of $L$\footnote{Hereafter, we choose to
indicate both the likelihood function and the log-likelihood function simply as $L$. No
confusion results.} about $\theta^*$

\begin{equation}
L (\theta) \approx L(\theta^*)+\nabla L(\theta^*) \Delta \theta+\frac{1}{2}\Delta \theta
H(\theta^*) \Delta \theta
\end{equation}

where $\nabla L(\cdot)$ and $H(\cdot)$ are the gradient (or score) vector and the Hessian
matrix of $L$, respectively, and $\Delta \theta \equiv \theta-\theta^*$. Consequently, we
maximize function (2.16m) with respect to $\Delta \theta$. The First Order Necessary
Conditions (FONC) for a solution are\footnote{Given $\vect{y}=\vect{A}\vect{x}$, where
$\vect{y}$ is a $m \times 1$ Euclidean vector, $\vect{A}$ is a $m \times n$ Euclidean matrix,
and $\vect{x}$ is a $n \times 1$ Euclidean vector, $\partial \vect{y}/\partial \vect{x}=\vect{A}$.
Given $\vect{y}=\vect{x}'\vect{A}\vect{x}$, where $\vect{y} \in \textbf{R}$, $\vect{x}$ is a $n
\times 1$ Euclidean vector, and $\vect{A}$ is a $n \times n$ Euclidean matrix, $\partial
\vect{y}/\partial \vect{x}=2\vect{A}\vect{x}$.}

\begin{subequations}
\begin{equation}
\nabla L(\theta^*)+H(\theta^*)\Delta \theta \approx 0
\end{equation}

Thus, as long as $H(\theta^*)$ is invertible, we have

\begin{equation}
\Delta \theta \equiv \theta^*-\theta \approx -H^{-1}(\theta^*)\nabla L(\theta^*)
\end{equation}
\end{subequations}

Newton optimization algorithm is built in the following manner: Let us begin with an initial
guess, $\theta_0$. Assume that our clue is not too far from the global maximizer $\theta^*$ (if
such maximizer exists) and that $L$ admits a second order polynomial approximation about
$\theta_0$. Define the following sequence

\begin{equation}
\theta_{n+1}=\theta_n -H^{-1}(\theta^*)\nabla L(\theta^*)
\end{equation}

Thus, at each stage $n$ (beginning at $0$), we consider the second order polynomial
approximation of $L$ about $\theta_n$ and we compute the step $\Delta \theta$ (i.e., we find
the successive term of the sequence, $\theta_{n+1}$) that we have to make in order to
maximize (being $H(\theta_n)$ negative semidefinite) the (approximated) value of $L$ in a
neighborhood of $\theta_n$. If our clue is not too far from $\theta^*$, it is clear that this
sequence will continue to climb the hill and eventually reach the maximizer $\theta^*$. On the
other hand, recall that a solution to the log-likelihood maximization problem (2.16l) may not
exist and that Newton optimization algorithm may fail to detect a global maximizer even when
such maximizer exists.


Now that we have an insight on how parameter estimation of GARCH models is generally
performed, we can proceed with testing for the presence of ARCH effects in regression model
(2.2a). If the test does confirm the presence of ARCH effects we can continue with the
estimation of an ARCH or GARCH model.


Here we report the results of the ARCH Lagrange Multipliers (LM) test based on auxiliary
regression (2.16g), where $\left \{ \hat{\varepsilon^2}_h \right \}_{t \in \tau}$ are the OLS
residuals associated to regression model (2.2a).

\begin{itemize}
\item ARCH LM test statistic: 121.998, p-value: 2.46835e-020\\
\end{itemize}

where we choose $p=12$. The null hypothesis of no ARCH effects is not accepted. Thus, we
conclude that regression model (2.16g) with $p=12$ is jointly significant. Intutively, the test
confirms the presence of nonlinear intertemporal dependence in the residuals associated to
regression model (2.2a).


Since the ARCH LM test indicates the presence of ARCH effects, we proceed with the
estimation of different ARCH and GARCH processes. We choose to begin with a Gaussian
ARCH(1) process (that is, an ARCH(1) process such that $u_t \sim WN\mathcal{N}(0,1)$). A
fundamental practical issue in parameter estimation of ARCH and GARCH models concerns the
correct choice of the distribution of $u_t$. In particular, the assumption of normality is not
always appropriate. However, as shown by Bollerslev and Wooldridge (1992), even when
normality is inappropriately assumed, maximizing the conditional Gaussian log-likelihood
function (2.16k) results in Quasi-Maximum Likelihood Estimates\footnote{A quasi-maximum
likelihood estimate can be defined as an estimate of a parameter vector $\theta$ in a
statistical model that is obtained by maximizing some transformation of the log-likelihood
function which is not strictly increasing.} (QMLEs) that are consistent and asymptotically
normally distributed, provided that the conditional mean and variance functions of the ARCH
or GARCH model are correctly speciefied. In addition, Bollerslev and Wooldridge (1992)
derived an asymptotic covariance matrix for the QMLEs that is robust to conditional non-
normality. Such QMLEs of the parameters $\theta$ are often referred to as sandwich
estimators. Consequently, it is generally believed that it is good practice to routinely use
sandwich covariance for inference purposes. Thus, we choose to use sandwich estimators. For
the sake of simplicity, we recall that the model we are estimating is as follows

\begin{equation}
\begin{split}
\Delta X_t&=\theta+\varepsilon_t\\
\varepsilon_t&=\sqrt{\alpha_0+\alpha_1 \varepsilon_{t-1}^2}u_t, \quad u_t \sim
WN\mathcal{N}(0,1)
\end{split}
\end{equation}

Here we report the results of the estimation of the ARCH(1) for the innovation process $\left
\{ \varepsilon_t \right \}_{t \in \tau}$ in model (2.16p), the time series graph of $\left \{ \Delta
X_t \right \}_{t \in \tau}$ together with the estimated conditional standard deviation, and the
distribution of the standardized ARCH residuals compared to a standard normal
distribution\footnote{The standardized error terms are given by
$u_t=\varepsilon_t/\sqrt{h_t}$. Recall that in the case of a Gaussian ARCH(1) process $u_t
\sim \mathcal{N}(0,1)$.}

\begin{table} [h]
\caption{Estimation of Gaussian ARCH(1) model}
\centering
Conditional variance equation
\begin{tabular}{|c|c|c|c|c|c|}
\hline\hline
& Coefficient & s.e. & z-ratio & p-value & \\
\hline
$\alpha_0$ & 1.28459 & 0.447754 & 2.869 & 0.0041 & ***\\
$\alpha_1$ & 2.76789 & 0.706329 & 3.919 & 8.90e-05 & *** \\
\hline
\end{tabular}
\begin{tabular}{|c c|c c|}
\hline
$\log$-likelihood & -1456.66987 & Akaike info. criterion & 2919.33975\\
Bayesian info. criterion & 2932.93238 & Hannan-Quinn & 2924.59899\\
\hline\hline
\end{tabular}
\end{table}

\begin{figure} [h]
\caption{Gaussian ARCH(1): Residuals and Estimated Conditional Standard Deviation}
\centering
\includegraphics[width=0.6\textwidth]{time.png}
\end{figure}

\begin{figure} [h]
\caption{Guassian ARCH(1): Estimated Density of Standardized Innovations and Standard
Normal}
\centering
\includegraphics[width=0.6\textwidth]{densities_ARCH-1_gauss.pdf}
\end{figure}

Both regressors are significant. Visual inspection of Figure 2.7 suggests that the estimated
conditional standard deviation generally exceedes the standard deviation characterizing the
first difference of the Energy CPI-U time series. In particular, the range of the former is almost
twice as large as the range of the latter (even though the estimated conditional standard
deviation is certainly consistent with the qualitative behavior of the evolution of $\left
\{ \Delta X_t \right \}_{t \in \tau}$ in time). Finally, visual inspection of Figure 2.8 clearly
suggests that the assumption of normality of $\left \{ u_t \right \}_{t \in \tau}$ is not
appropriate.


Another important practical issue is the choice of the ARCH order $p$ and the GARCH order
$q$. In particular, the choice of $p=1$ and $q=0$ in model (2.16p) was completely arbitrary. In
practice, there is not a methodology for choosing $p$ and $q$. Instead, there are several
model selection criteria which can be used to discard a particular model in favor of another
specification. Hereafter, we give a brief description of the likelihood ratio (LR) test, the Akaike
Information Criterion (AIC) and the Bayesian (Schwarz) Information Criterion (BIC).


One straightforward manner to compare two models is by looking at the maximized value of
the log-likelihood function. Usually, probability density takes values that are smaller than one,
in such a way that its logarithm is negative. Thus, a model with a greater log-likelihood (that is,
a log-likelihood which is closer to 0 from the left) exhibits a better fit, in the sense that the
maximum likelihood estimates of such model works better with the assumed distribution,
given a certain sample. As a first step, we choose to rely on the LR test as a model selection
criterion. The test is performed by estimating two models and comparing the fit of one model
to the fit of the other. Adding explanatory variables to a model will almost always make the
model fit better (i.e., a model will have a greater log-likelihood). However, it is necessary to
test whether the observed difference in model fit is statistically significant. The LR test does
this by simply comparing the (maximized) log-likelihoods of the two models. If this difference is
statistically significant, then the less restrictive model (the one with more variables) is said to
fit the data significantly better than the more restrictive one. Given the (maximized) log-
likelihoods associated to two models, the LR test statistic is given by

\begin{equation}
LR=-2\log \left( \frac{L_{1}}{L_{2}} \right)=2(\log(L_2)-\log(L1))
\end{equation}

where 1 indicates the more restrictive model and 2 the less restrictive one. The resulting test
statistic is distributed as a chi-squared with degrees of freedom equal to the number of
variables removed from the less restrictive model in order to obtain the more restrictive one.


As an alternative model selection criterion, one can use AIC. Roughly speaking, the AIC is a
measure of the relative informative quality of a statistical model given a sample and a
distribution. Since it is a measure of the relative quality of a model, the value of the AIC
associated to a certain model has no meaning \textit{per se}: It is useful only when contrasted
with the value of the AIC associated to an alternative model. The AIC is given by

\begin{equation}
LR=2k-2\log(L)
\end{equation}

where $k$ is the number of parameters in the model. Basically, this index takes into account
both the statistical goodness of fit (that is, the likelihood) of a model and the number of
parameters that have to be estimated to achieve this particular degree of fit, by imposing an
ad hoc (linear) penalty for increasing the number of parameters. Lower values of the index are
preferred to higher values (an increase in the number of parameters to be estimated, the
likelihood of the model being the same, is associated to an increase in the AIC, while an
increase in the goodness of fit of the model, the number of parameters being the same, is
associated to a decrease in the AIC).


Finally, one can use BIC. Again, the BIC is a measure of the relative informative quality of a
statistical model given a sample and a distribution. The BIC is given by

\begin{equation}
BIC=k\log(T)-2\log(L)
\end{equation}

where $k$ is the number of parameters to be estimated and $T$ is the sample size. Note that
BIC is increasing in $k$ and $T$ and decreasing in $L$. Thus, we prefer small values of the BIC.


Using the LM ratio test as model selection criterion, we find that the best fitting models we can
conpute are a skewed-t ARCH(4) and a skewed-t GARCH(2,1)\footnote{All the models we
specified have been estimated by means of sandwich estimators.}. According to the LM ratio
test, the skewed-t GARCH(2,1) model is at least as good as the skewed-t ARCH(4), even if the
former does not significantly outperform the latter or viceversa. If we use the AIC as model
selection criterion results are similar: As a matter of fact, when relying on AIC the most
preferred model among the ones we estimated is the skewed-t GARCH(2,1). An identical
conclusion is drawn if the client's preferences are based on BIC. Thus, for the sake of
completeness, we report the estimation output of the skewed-t GARCH(2,1) model. We recall
that the model we are estimating is as follows

\begin{equation}
\begin{split}
\Delta X_t&=\theta+\varepsilon_t\\
\varepsilon_t&=\sqrt{\alpha_0+\alpha_1 \varepsilon_{t-1}^2+\delta_1 h_{t-1}+\delta_2 h_{t-
2}}u_t, \quad u_t \sim Skewed\;t
\end{split}
\end{equation}
\end{subequations}

\begin{table} [h]
\caption{Estimation of skewed-t GARCH(2,1) model}
\centering
Conditional variance equation
\begin{tabular}{|c|c|c|c|c|c|}
\hline\hline
& Coefficient & s.e. & z-ratio & p-value & \\
\hline
$\alpha_0$ & 0.00548214 & 0.00341154 & 1.607 & 0.1081 & \\
$\alpha_1$ & 1.23510 & 0.552999 & 2.233 & 0.0255 & ** \\
$\delta_1$ & 0.248492 & 0.115485 & 2.152 & 0.0314 & ** \\
$\delta_2$ & 0.246759 & 0.0888286 & 2.778 & 0.0055 & *** \\
\hline
\end{tabular}
\begin{tabular}{|c c|c c|}
\hline
$\log$-likelihood & -926.81203 & Akaike info. criterion & 1867.62406\\
Bayesian info. criterion & 1899.34021 & Hannan-Quinn & 1879.89564\\
\hline\hline
\end{tabular}
\end{table}

\begin{figure} [h]
\caption{Skewed-t GARCH (2,1): Residuals and Estimated Conditional Standard Deviation}
\centering
\includegraphics[width=0.6\textwidth]{time2.png}
\end{figure}

\begin{figure} [h]
\caption{Skewed-t GARCH(2,1): Estimated Density of Standardized Innovations and Standard
Normal}
\centering
\includegraphics[width=0.6\textwidth]{densities_GARCH-2-1_skewed.pdf}
\end{figure}

Note (see Figure 2.9) that the estimated conditional standard deviation associated to the
skewed-t GARCH(2,1) model is quantitatively consistent with the standard deviation
characterizing the first difference of the Energy CPI-U time series, relative to the conditional
standard deviation associated to the Gaussian ARCH(1) model. Moreover (see Figure 2.10),
assuming $\left \{ u_t \right \}_{t \in \tau}$ follows a skewed-t distribution is clearly
appropriate, relative to the assumption of normality.

\section{Conclusions}
As we have seen, the Energy CPI-U time series is likely to be a trajectory of an ARIMA($1,1,q$)
process. We cannot estimate the latter statistical model because the autocorrelation
associated to the first difference of the Energy CPI-U time series is significant up to lag 112.
This suggests that the number of lags to be included in the MA part of the ARIMA process is
equal to 112. In order to overcome this problem we choose to aggregate the data we have,
passing from a monthly to a yearly frequency. Aggregation of the data eliminates the
persistency in the first difference of the Energy CPI-U time series, and, as a consequence, rules
out the need of adding MA terms. Thus, the statistical model we obtain for the yearly
frequency Energy CPI-U time series is

\begin{equation}
\Xi_T=\Xi_{T-1}+\nu_T
\end{equation}

where $\left \{ \nu_T \right \}_{T \in \mathbb{T}}$ is the innovation process. As for this
innovation process, we do not observe the presence of significant autocorrelation in the OLS
residuals associated to regression model (3.1)\footnote{Durbin-Watson test statistic: 2.08687,
Breusch-Godfrey test statistic at the first lag: 0.164593 with p-value: 0.685, Ljiung-Box test
statistic at the first lag: 0.191888 with p-value 0.661.}. However, we find that such residuals
are characterized by heteroskedasticity\footnote{Breusch-Pagan test statistic: 61.207194 with
p-value: 0.000000, White test statistic: 18.807774 with p-value: 0.000014.}. Despite the
presence of heteroskedasticity, we do not grasp the existence of significant ARCH
effects\footnote{ARCH LM test statistic at the first lag: 1.48241 with p-value: 0.223398.}: As
econometric theory predicts, temporal aggregation loses information about the underlying
data process. According to our estimates, it does not seem desirable to assume $\left \{ \nu_T
\right \}_{T \in \mathbb{T}}$ follows an ARCH or GARCH process. In other words, when using
yearly frequency data the ARCH effects in the OLS residuals of model (3.1) are lost, even
though heteroskedasticity does not. Finally, the OLS residuals associated to regression model
(3.1) are not normally distributed\footnote{JB test statistic: 169.297 with p-value: 1.72838e-
037.}.


Hereafter, we report the time series graph of the yearly frequency Energy CPI-U time series
and the estimate we obtain when using statistical model (3.1). We choose to carry out a
forecasting exercise using the last five years (from 2008 to 2013) as forecasting interval.

\begin{figure} [h]
\caption{Energy CPI-U Time Series, Estimate and Forecast}
\centering
\includegraphics[width=0.6\textwidth]{Johan.pdf}
\end{figure}

Visual inspection of Figure 3.1 suggests that our forecast captures the core dynamics of the
Energy CPI-U time series, even though it fails to take into account its short run behavior.


Finally, even though the behavior of the Energy CPI-U time series is much simpler when using
yearly frequency data (residual autocorrelation and ARCH effects fade away), we recall that a
GARCH model seems to be needed when using monthly frequency data in order to take into
account much of the volatility clustering characterizing the error terms. In particular, our
estimation exercises lead to the conclusion that the monthly frequency Energy CPI-U time
series is a trajectory of the following stochastic process

\begin{equation}
\begin{split}
X_t&=X_{t-1}+\varepsilon_t\\
\varepsilon_t&=\sqrt{1.24 \varepsilon_{t-1}^2+0.25h_{t-1}+0.25h_{t-2}}u_t \quad u_t \sim
Skewed\;t
\end{split}
\end{equation}

\end{document}

Vous aimerez peut-être aussi