Time Series Analysis and Forecasting Techniques

Quantitative
Techniques for Financial Economics: Time Series

1 Time Series Analysis: General
• Relevance of time series analysis
o Price behavior over time
o Causality: causes of the price behavior
o Portfolio construction/causality: analyzing multiple series
o Risk, variance, volatility: uncertainty in price behavior
Overview of topics
• Overview of topics 5/43
School of Business and Economics

• CLRM assumptions for OLS estimation
o Zero mean: 𝐸 𝜀# = 0
Ø If a constant term is included in the regression equation, this assumption will
never be violated
o Homoscedasticity (constant variance): 𝑉𝑎𝑟 𝜀# = 𝜎 *
o No serial correlation (no autocorrelation): 𝐶𝑜𝑣 𝜀# , 𝜀/ = 0 𝑓𝑜𝑟 𝑠 ≠ 𝑡
o No perfect multicollinearity (“full rank”)
𝛽 = (𝑋 8 𝑋):; 𝑋′𝑦
Ø No independent variable is constant or a perfect linear combination of the other
variables, i.e the number of observations 𝑁 is greater than the number of
parameters estimated 𝐾, so that the 𝑁×𝐾 matrix of explanatory variables 𝑋 has
full rank, i.e. 𝑅𝑎𝑛𝑘 𝑋 = 𝐾
o Normality: 𝜀# ~𝑁(0, 𝜎 * )
Ø In the presence of heteroscedasticity or autocorrelation, OLS estimators still give
unbiased coefficient estimates but they are no longer the best linear unbiased estimators
1
(BLUE), i.e. they no longer have the minimum variance among the class of unbiased
estimators
• Omitted variable bias: when omitting a variable from the model, the estimated coefficients on all
the other variables will be biased and inconsistent unless the excluded variable is uncorrelated
with all the included variables, i.e. the variable is irrelevant which leads to unbiased but
inefficient coefficient estimates (not BLUE)
Ø Even if this condition is satisfied, the estimate of the coefficient on the constant term will
be biased, which would imply that any forecasts made from the model would be biased,
leading to standard errors being biased upwards, so hypothesis tests could yield
inappropriate inferences

1.1 Basics: hypotheses testing

• 𝑝-value: the significance level for which a rejection and a non-rejection region can be determined,
i.e. the level at which the null hypothesis is rejected
o The p-value for each term tests the null hypothesis that the coefficient is equal to zero, i.e.
that there is no effect
• 𝑡-test: testing one hypothesis of the coefficient estimates with the test statistic following an 𝑡-
distribution under the null hypothesis
o The null is rejected if the test statistic is lower than the negative critical 𝑡-value or higher
than the positive critical 𝑡-value
• 𝐹-test: jointly testing multiple hypotheses of the coefficient estimates with the test statistic
following an 𝐹-distribution under the null hypothesis
o The null is rejected only if the test statistic is higher than the critical 𝐹-value since the 𝐹-
distribution has only positive values and is not symmetrical
• It is not possible to test hypotheses that are not linear or that are multiplicative using 𝑡-tests or 𝐹-
tests

1.2 Basics: goodness of fit

• Goodness of fit: statistics to test how well the sample regression function fits the data
• 𝑅 * : the square of the correlation between the values of the dependent variable and the
corresponding fitted values from the model, which lies between 0 and 1, with a higher 𝑅 *
implying, everything else being equal, that the model fits the data better
o It is not sensible to compare the value of 𝑅 * across models with different dependent
variables
o 𝑅 * never falls if more regressors are added to the regression, i.e. it cannot be used as a
determinant of whether a given variable should be present in the model or not
o 𝑅 * can take values of 0.9 or higher for time series regressions, and hence it is not good at
discriminating between such models
2
• Adjusted 𝑅 * : modification to take into account the loss of degrees of freedom associated with
adding extra variables, so include the variable if 𝑅 * rises and do not include it if 𝑅 * falls
o Soft criterion for model selection as it typically selcts the largest model of all

1.2 Trends and seasonality

• Consider the following econometric model for consumption
𝑦# = 𝛽G + 𝛽; 𝑖# + 𝜀#
o The model can be extended to allow for a trend, explicitly recognizing that 𝑦# changes
over time for reasons unrelated to 𝑖#
𝑦# = 𝛽G + 𝛽; 𝑖# + 𝛽* 𝑡 + 𝜀#
o The model can be extended to allow for seasonalities by adding dummy variables to
change the intercept, f.ex. for including December effects
𝑦# = 𝛽G + 𝛽; 𝑖# + 𝛽* 𝑑𝑒𝑐 + 𝜀#
Ø Dummy variable trap: if there is an intercept term, the number of dummies
included need to be one less than the seasonality of the data, otherwise the sum
of dummies would be 1 in every period, i.e. there would be perfect
multicollinearity

3
2 Stationary Models: Univariate Time Series
2.1 General
• Covariance stationary: a time series is stationary if its mean, variance and autocovariances are
constant over time, i.e. they do not depend on time t
𝐸 𝑦# = 𝜇
*
𝐸 𝑦# − 𝜇 = 𝜎 *
𝐸 𝑦# − 𝜇 𝑦#:P − 𝜇 = 𝛾P
o Stationarity is a necessary condition to avoid spurious regressions and to enable the
analysis of time series data and forecasting
• Innovation process 𝜀# : unpredictable movements in 𝑦
• White noise process 𝜀# ~WN 0,1 : a white noise process has constant mean and variance, and zero
autocorrelations
𝐸 𝜀# = 0
𝐸 𝜀#* = 𝜎 *
𝐸 𝜀# 𝜀/ = 0 𝑓𝑜𝑟 𝑠 ≠ 𝑡
• The stationary models, AR(p), MA(q) and ARMA(p,q), are linear regression models, so CLRM
assumptions apply

2.2 AR(p) model

• Autoregressive AR(p) process: 𝑦# depends on p lagged values of the variable plus an error term

Ø Stationarity condition: an AR(p) process is stationary if and only if 𝜙 < 1 for all 𝑘 =
1, … , 𝑝, i.e. all the solutions of 𝜙 𝑧 = 0 should lie outside the unit circle
Ø Wold’s decomposition theorem: an AR(p) process with no constant term or other terms
can always be written in terms of an MA(¥)process
4
• AR(1) model

o Unconditional mean, unconditional variance and derivation of ACF for an AR(1) model
[See notes and Brooks (2008), pp 218-222]

2.3 MA(q) model

• Moving average MA(q) process: 𝑦# depends on q current and previous values of a white noise
disturbance term (linear combination of white noise processes)

o Properties: constant mean, constant variance, and non-zero autocorrelation to lag q and
will always be zero thereafter
Ø Invertibility condition: the MA(q) model is invertible if and only if 𝜃 < 1, i.e. all the
solutions of the equation 𝜃 𝑧 = 0 should lie outside the unit circle, and then can be
expressed as an AR(p) model

2.4 ARMA(p,q) model

• Autoregressive moving average ARMA(p,q) process: 𝑦# depends on p lagged values of the variable
plus a linear combination of q current and previous values of a white noise error term

Ø Stationarity condition: an ARMA(p,q) process is stationary if and only if 𝜙 < 1 for all
𝑘 = 1, … , 𝑝, i.e. all the solutions of 𝜙 𝑧 = 0 should lie outside the unit circle
Ø Invertibility condition: an ARMA(p,q) process is invertible if and only if 𝜃 < 1, i.e. all the
solutions to 𝜃 𝑧 = 0 lie outside the unit circle, and then can be expressed as a pure
AR(p) modell
• ARIMA(p,d,q) process: an ARMA(p,q) process differenced 𝑑 times

5
• How can an ARMA(1,1) and AR(1) model be compared in terms of in-sample fit?
o 𝑅 * cannot be used as the models do not have the same amount of parameters
o Information criteria comparisons can be performed as the models have the same
dependent variable and an intercept term
o Misspecification tests and autocorrelation tests can be used to check if one of the models
does not satisfy the CLRM assumptions

• What are the consequences of a unit root in the AR part or in the MA part of an ARMA(p,q) model?
o If the process has a unit root in the AR part it would be non-stationary and exhibit some
random walk behavior: although the parameters could be still estimated e.g. by least
squares, the asymptotic distribution of the estimators would be non-standard and affect
confidence intervals and critical values of tests
o A unit root in the MA part, on the other side, has serious consequences for the estimation
of the parameters as the prediction errors are no longer equal to the innovations 𝜀#

2.5 Autocorrelation and partial autocorrelation function

• An analysis of the ACF and PACF help to determine the proper time series model and reflect how
the observations are related to each other: the ACF and PACF are plotted against consecutive time
lags to determine the order of AR and MA terms
• Autocorrelation function (ACF): the autocorrelation function measures the correlation between
𝑦# and 𝑦#:P , for 𝑘 > 0, which is the coefficient at 𝑦#:P in a regression of 𝑦# on a constant term and
𝑦#:P
𝛾P
𝜌P = = 𝜙 P (𝐴𝑅 𝑝 )
𝛾G
o For 𝜙 = 1, the process 𝑦# is no longer stationary
o Autocorrelations describe the short-run dynamic relations within time series in contrast
with a trend, which corresponds to the long-run dynamic relations
• Partial autocorrelation (PACF): the partial autocorrelation function measures the correlation
between 𝑦# and 𝑦#:P , for 𝑘 > 0, which is the coefficient at 𝑦#:P in a regression of 𝑦# on a constant
term and lagged values of 𝑦# , after having controlled for observations at intermediate lags
o At lag 1, the ACF and PACF are equal since there are no intermediate lag effects to
eliminate, i.e. 𝜙 PP = 𝜙 P
o The PACF has non-zero partial autocorrelation coefficients for lags up to the order of the
model and zero coefficients thereafter
𝜙 PP = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑘 > 𝑝 (𝐴𝑅 𝑝 )
• Properties of ACF and PACF
AR(p) MA(q) ARMA(p,q)
Geometrically decaying ACF Non-zero points of ACF: q Geometrically decaying ACF
Non-zero points of PACF: p Geometrically decaying PACF Geometrically decaying PACF
6
2.6 Forecasting
• A model will be estimated only using the data from the in-sample period, which is then used to
forecast the values of the out-of-sample period to assess whether they are close to the actual
values

Ø The in-sample fit relates to the goodness of the estimation, i.e. the estimation of the in-
sample data, while the out-of-sample fit relates to the goodness of the forecast
• Conditional expectation: the expected value of 𝑦 is taken for 𝑡 + 𝑠, given the all information
available up to and including time 𝑡, i.e. the values 𝑦 and of the error terms up to time 𝑡 are
known, so that 𝑢#_/ = 0 for 𝑠 > 0, so that E(𝑢#_/ # ) = 0
𝐸(𝑦#_/ Ω# )
• Forecasting with ARMA (p,q) models: the forecast function generates the forecasts
b f
𝑓#,/ = 𝑎a 𝑓#,/:a + 𝑏e 𝑓#_/:e

ac; ec;
o Assumption: parameter stability over time

• Forecasting the future value of an MA(q) process: as a moving average process has only a memory
of length 𝑞, the forecasting horizon is limited, so that all forecasts of more than 𝑞 steps ahead
collapse to the intercept
o Example for an MA(3) process

• Forecasting the future value of an AR(p) process: unlike a moving average process, an
autoregressive process has an infinite memory
o Example for an AR(2) process

o The s-step ahead forecast for an AR(2) process is given by the intercept plus the
coefficient on the one-period lag multiplied by the time 𝑠 − 1 forecast plus the coefficient
on the two-period lag multiplied by the 𝑠 − 2 forecast
𝑓#,/ = 𝜇 + 𝜙; 𝑓#,/:; + 𝜙* 𝑓#,/:*
2.6.1 Forecast performance

• The fit of the forecast can be measured by comparing forecast performance measures, which
estimate forecast accuracy and help to compare different models
o Mean forecast error (MFE): a measure of average deviation of forecasted values
§ Forecast bias: it shows the direction of error, where the effects of positive and
negative errors cancel out
§ An MFE close to zero implies minimum bias (zero MFE does not imply that
forecasts contain no error but rather indicates that forecasts are on proper
target)
o Mean absolute error (MAE): a measure of average absolute deviation of forecasted values
§ It shows the magnitude of overall error that occurred while forecasting
§ The effects of positive and negative errors do not cancel out
§ Caveat: the measure does not provide any idea about the direction of error
o Mean squared error (MSE): a measure of average squared deviation of forecasted values
§ It penalizes extreme errors that occurred while forecasting, i.e. the total forecast
error is much affected by large individual errors
o Mean absolute error (MAE): a measure of average absolute deviation of forecasted values
§ It shows the magnitude of overall error that occurred while forecasting
§ The effects of positive and negative errors do not cancel out

2.7 Model choice

• When building a times series model, parsimony is the guiding principle to minimize the
possibility of violating model assumptions and to avoid overfitting
Ø Parsimonious model: a model that describes all of the features of data of interest using as
few parameters as possible
• Model choice (implicitly using the Box-Jenkins approach)
o Stage 1: identification of the appropriate model
§ Ensure that the data is stationary
§ Check ACF and PACF to assess if AR(p), MA(q) or ARMA(p,q) can be applied
8
o Stage 2: estimation of the parameters of the model
§ Estimate the model: AR(p) models can be estimated using ordinary least squares
(OLS), while MA(q) and ARMA(p,q) models can be estimated using non-linear
least squares (NLS) or maximum likelihood (ML)
§ Check for significance of coefficients
o Stage 3: diagnostic checking of the model
§ Residual diagnostics: check whether the residuals are approximately a white
noise process, i.e. whether they are not linearly dependent and hence have
(partial) autocorrelations equal to 0
§ Check information criteria for model choice
Ø Models can be compared by checking the significance of coefficients, by residual
diagnostics, and by comparing information criteria: focus on information criteria
comparisons
Ø For the purpose of only forecasting, models are compared by using forecast performance
measures

2.7.1 Information criteria for lag length selection

• Information criteria: the model with the lowest value will be chosen
o Information criteria embody a function of the RSS and a penalty term for the loss of
degrees of freedom from adding extra parameters: adding an extra term to the model will
reduce the value of the criteria only if the fall in the RSS is sufficient to more than
outweigh the increased value of the penalty term

o Total number of parameters 𝑘 = 𝑝 + 𝑞 + 1 in the selected model
Ø Information criteria give a single number per model which enables the comparison of
several models at once in order to select the appropriate lag length
• Example of information criteria for ARMA models: AIC selects an ARMA(4,4) model, while SBIC
selects an ARMA(2,0) model

• For VAR(p) models, there exist multivariate versions of the information criteria
2.7.2 Forecast performance

• Forecast errors: the model with the smallest error measure will be chosen
o Mean square forecast error (MSFE)
o Mean absolute forecast error (MAFE): measure of the average absolute forecast error

10
3. Non-stationary Models: General
• A model whose coefficients are non-stationary will exhibit the unfortunate property that values of
the error term will have a non-declining effect on the current value of 𝑦# as time progresses
Ø Any shock, i.e. a change or an unexpected change in a variable or the value of the error
term, will be persistent
• Consider the general model
𝑦# = 𝛼 + 𝛽𝑡 + 𝜙𝑦#:; + 𝜀#

3.1 Deterministic trend

• Deterministic trend (𝜙 = 0, 𝛼 ≠ 0, 𝛽 ≠ 0): series with a steady upward or downward trend
𝑦# = 𝛼 + 𝛽𝑡 + 𝜀#
o Non-stationary: the mean 𝐸 𝑦# = 𝛼 + 𝛽𝑡 varies over time
• Trend stationary: the series 𝑦# is trend-stationary, i.e. once the deterministic linear trend 𝛿; + 𝛿* 𝑡
(long-term trend) is estimated, it can be removed from the data and the de-trended series 𝑧# is
stationary
𝑧# = 𝑦# − 𝛿; − 𝛿* 𝑡
Ø Time series with a deterministic trend always revert to the trend in the long run, innovation
shows have an effect that diminish over time (the effects of shocks are eventually eliminated), and
the forecast variance 𝐸 (𝑦k_l − 𝑦k_l )* ≈ 𝜎 * is constant for all horizons

3.2 Stochastic trend: random walk without and with drift

• Random walk (𝜙 = 1, 𝛼 ≠ 0 for drift, 𝛽 = 0): series with no steady trend direction
𝑦# = 𝑦#:; + 𝜀#
𝑦# = 𝛼 + 𝑦#:; + 𝜀#
o Non-stationary: the process has a unit root, i.e. a root 𝜙 𝑧 = 0 at 𝜙 = 1
o Random with drift (𝛼 ≠ 0): the series moves upward if 𝛼 > 0 and downward if 𝛼 < 0
• Difference stationary: the series 𝑦# is difference-stationary, i.e. when differencing d times, the
series Δ𝑦# = 𝜀# or Δ𝑦# = 𝛼 + 𝜀# have time-invariant mean and variance and becomes stationary
𝐸 𝜀# = 0 𝑖𝑓 𝛼 = 0
𝐸 𝛼 + 𝜀# = 𝛼
𝑉𝑎𝑟 𝛼 + 𝜀# = 𝜎 *
Ø Time series with a stochastic trend never revert to the trend, innovation shocks have a permanent
effect, and the forecast variance 𝐸 (𝑦k_l − 𝑦k_l )* ≈ ℎ𝜎 * increases for larger horizons ℎ
• Integration: a series of integration order d, or 𝐼(𝑑), that becomes stationary after differencing
(minimally) d times
o If 𝑦# is stationary, then 𝑦# is 𝐼(0)
o If 𝑦# is non-stationary, but Δ𝑦# is stationary, then 𝑦# is 𝐼(1) and has a unit root, so 𝜙 𝑧 =
0 (e.g. random walk)
• Random walk with drift and deterministic trend (𝜙 = 1, 𝛼 ≠ 0, 𝛽 ≠ 0)
11
𝑦# = 𝛼 + 𝛽𝑡 + 𝑦#:; + 𝜀#
o The differenced series Δ𝑦# = 𝛼 + 𝛽𝑡 + 𝜀# is still time-varying and thus has to be
detrended to make it stationary
• CAPM model
𝑟# = 𝑟q + 𝛽 𝑟#r − 𝑟q + 𝜀#
𝑆# − 𝑆#:;
= 𝑟q − 𝛽𝑟q + 𝛽𝑟#r + 𝜀#
𝑆#:;
Δ𝑆# = 1 − 𝛽 𝑟q + 𝛽𝑟#r + 𝜀#
Δ𝑆# = 𝛼 + 𝛽𝑡 + 𝜀#

Comment by Nalan Bastürk
“First, a random walk process (with or without a drift) does not have a constant mean. To be more
precise, the mean is not finite. Stock prices are good examples of this. When there is a shock in these
prices, the effect of this shock is permanent, i.e. the whole evolution of prices is affected by this shock.
This explanation is in line with the data being non-stationary. I think what you mean by a 'pure trend'
model is the deterministic trend case. With deterministic trends, there is an overall direction in the
data, such as a positive trend. The data can be away from this trend for a short period but then it goes
back to the trend line. Another way of calling such deterministic trends is 'trend stationarity'. The
shocks in this case die out, i.e. the data moves towards the deterministic trend. It is important that
even in case of deterministic trends there is no constant mean.”

“When the first difference of data is trend-stationary, we can use stationary time series models for
the data (such as a linear regression or an AR model) but the model should include a trend term to
capture deterministic trends/trend stationarity.”

3.3 Testing for unit roots

• In several applications, deterministic and stochastic trend detection is done by removing the
deterministic trend from the data and testing the de-trended series for a stochastic trend using
unit root tests, such as the ADF or KPSS tests
3.3.1 Augmented Dickey-Fuller test

• Dickey-Fuller (DF) test: the test considers the following basic test equation for testing for unit
roots / non-stationarity in series 𝑦# (with 𝜌 = 𝜙 − 1):
Δ𝑦# = 𝛼 + 𝛽𝑡 + 𝜌𝑦#:; + 𝜀#
o Assumption: 𝜀# is white noise, i.e. the errors are serially uncorrelated
o Test hypotheses: if 𝐻G is not rejected, the series has a unit root and a stochastic trend, i.e.
the series is non-stationary; if 𝐻G is rejected, the series has no unit root and no stochastic
trend, i.e. the series is stationary
𝐻G : 𝜌 = 0
𝐻; : 𝜌 < 0
12

o Test hypotheses with trend 𝛽: if 𝐻G is not rejected, the series has a unit root and is a
random walk (possibly with drift); if 𝐻G is rejected, the series has no unit root and a
deterministic trend, i.e. the series is trend-stationary
𝐻G : 𝜌 = 𝛽 = 0
𝐻; : 𝜌 < 0 𝑎𝑛𝑑 𝛽 ≠ 0
• Augmented Dickey-Fuller (ADF) test: the basic test equation is extended by lagged values of Δ𝑦#
to account for remaining autocorrelation in error terms, where the lag length p can be specified
using information criteria comparisons
Δ𝑦# = 𝛼 + 𝛽𝑡 + 𝜌𝑦#:; + 𝜌; Δ𝑦#:; +. . . +𝜌b:; Δ𝑦#:b:; + 𝜀#
o Assumption: as the white noise assumption for 𝜀# may be violated, lagged values are
added to eliminate the serial correlation
o Test hypotheses with drift and trend: if 𝐻G is not rejected, the series has a unit root and a
stochastic trend; if 𝐻G is rejected, the series has a deterministic trend
𝐻G : 𝜌 = 0 𝑎𝑛𝑑 𝛽 = 0
𝐻; : 𝜌 < 0 𝑎𝑛𝑑 𝛽 ≠ 0
• DF critical values are more negative than under the standard 𝑡-distribution due the null
hypothesis of non-stationarity, i.e. more evidence is needed against the null hypothesis in the
context of unit roots
o The null hypothesis is rejected in favor of the stationary alternative if the test statistic is
more negative than the critical value

3.3.2 Other unit root tests

• Phillips-Perron (PP) tests are similar to ADF tests but incorporate automatic correction to allow
for autocorrelation residuals
• KPSS test: the test considers the following test equation of a trend-stationary model for testing for
unit roots / non-stationarity in series 𝑦#
𝑦# = 𝛼 + 𝛽𝑡 + 𝜇# + 𝜀#
𝜇# = 𝜇#:; + 𝜂# , 𝜂# ~𝑁(0, 𝜎x* )
o Test hypotheses: if 𝐻G is not rejected, the series has no stochastic trend and is stationary;
if 𝐻G is rejected, the series has a unit root and is non-stationary
𝐻G : 𝜎x* = 0
𝐻; : 𝜎x* > 0

13
4. Stationary Multivariate Models
4.1 VAR(p) models
• Vector autoregressive VAR(p) model: series specifically used to model inter-dependencies or
dynamic correlations between economic time series
y
𝑦# = 𝐴G + 𝐴P 𝑦#:P + 𝜀#
Pc;
o There is one equation for each variable as dependent variable

o RHS of each equation includes lagged values of all dependent variables in the system
Ø Multivariate extension of standard AR(p) models, i.e. (covariance) stationarity is
required for all time series which are jointly modeled
Ø If all the matrices of the vector autoregressive coefficients are diagonal and ∑ is diagonal,
then the VAR(p) model consists of a set of 𝑁 independent AR(p) models
• Advantages of VAR models
o OLS can be separately used on each equation if there are no contemporaneous terms on
the RHS of equations, i.e. there are only pre-determined variables on the RHS (exogenous
variables and lagged values of endogenous variables)
o VAR allows to capture more features of the data, as the value of a variable can depend on
more than just its own lags or combinations of white noise terms
o VAR uses only past observations of explanatory variables, so they are useful for short-run
forecasting
• VAR(1) model with two variables

o General structural VAR

• If the VAR model includes contemporaneous terms, they can be taken over to the LHS and then
both sides of the equation can be pre-multiplied by the inverse of the matrix on the LHS in order
to get the standard form VAR

o General structural VAR

o Contemporaneous terms taken over to the LHS and both sides being pre-multiplied

o Standard form VAR
14
4.2 VARX(p) models
• VARX(p) model: the VAR(p) model can be augmented with a vector of exogenous variables 𝑋#
times a matrix of coefficients 𝐵 (e.g. by adding a deterministic trend)
y
𝑦# = 𝐴G + 𝐴P 𝑦#:P + 𝐵𝑋# + 𝜀#
Pc;
o The value of 𝑋# are determined outside of the VAR system, i.e. there are no equations in
the VAR with dependent variables

4.3 Impulse response functions

• When looking at VAR models, it is of interest which sets of variables have significant effects on
each dependent variable and which do not
• Testing joint hypotheses with F-tests such as the Granger causality test (testing hypothesis and
their implied restrictions on the parameter matrices) show if there is correlation but do not show
the sign of effect, i.e. whether changes in the value of a given variable have positive or negative
effects on other variables, or how long the effect lasts
Ø Causality implies a chronological ordering of movements in the series and not that
movements in one variable physically cause movements in another
• Impulse response functions assess the effect of a unit shock applied to an error term of a variable
on the dependent variables
o Bivariate VAR(1) model: unit shock to 𝑦;# at time 𝑡 = 0
15
5. Non-stationary multivariate models: cointegration and vector error correction
5.1 Cointegration
• If variables with differing orders of integration are combined, the combination will have an order
of integration equal to the largest, so for variables that are 𝐼 1 combined, their linear
combination will also be 𝐼 1
• Cointegration relationships in finance
o Spot and futures prices for a given commodity or asset
o Ratio of relative prices and an exchange rate
o Equity prices and dividends
Ø Market forces arising from no-arbitrage conditions suggest that there should be an
equilibrium relationship between the series concerned
• Criteria for cointegration between non-stationary series
o The series are integrated of the same order 𝑑, i.e. they are all 𝐼(𝑑)
o There exists a linear combination of the variables which is integrated of lower order,
such that they are 𝐼(𝑑 − 𝑏)
Ø Many time series are non-stationary but move together over time, i.e. a cointegrating
relationship can be seen as a long term, or equilibrium relationship
• Many financial variables contain a unit root and are thus 𝐼 1 : series that are 𝐼 1 will be
cointegrated, if there exists a cointegrating vector 𝛽 such that the linear combination of the series
is 𝐼 0 (stationary), i.e. 𝜀# is 𝐼 0
Ø Regression between 𝐼 1 variables is only valid when they are cointegrated
• The system of equations consists of three variables 𝑦# = 𝑥;# , 𝑥*# , 𝑥}# ′, where the error terms are
uncorrelated white noise processes, i.e. they are stationary

o As 𝑥}# contains a unit root, 𝑥;# and 𝑥*# both also contain a unit root, so they are all 𝐼 1 : a
system of 𝐼 1 variable will itself be 𝐼 1 and will be 𝐼 0 if the variables are cointgrated
o Cointegrating vector for 𝑥*# and 𝑥}#
𝛽 = 0,1, −𝛽}
o Cointegrating vector for 𝑥;# , 𝑥*# , and 𝑥}#
𝛽 = 1, −𝛽; , −𝛽*

5.2 Testing cointegration

• Engle-Granger method for testing cointegration
o Step 1: Test the individual series for unit roots and verify that they are both 𝐼 1
§ Apply ADF to 𝑦# and 𝑥# to test for unit roots: unit root must not be rejected,
otherwise 𝑦# or 𝑥# are 𝐼 0
§ Apply ADF to Δ𝑦# and Δ𝑥# to test for unit roots: unit root must be rejected
16
o Step 2: Estimate the cointegrating regression using OLS and test whether the residuals
are 𝐼 0 , i.e. test for cointegration
§ Test hypotheses: if 𝐻G is not rejected, there is no cointegration and the
appropriate strategy is to use first differencing since there is no long-run
relationship; if 𝐻G is rejected, there is cointegration and the appropriate strategy
is to form an estimate an error correction model
𝐻G : 𝜀# ~𝐼 1
𝐻; : 𝜀# ~𝐼 0
o Step 3: Form and estimate an error correction model (ECM)

5.3 Vector error correction model

• Cointegrated series
𝑦# = 𝛽G + 𝛽; 𝑥# + 𝜀#
o First difference of cointegrated series
Δ𝑦# = 𝛽G + 𝛽; Δ𝑥# + 𝜀#
Ø Differencing can be used for univariate modelling but omits important information about
the long-run equilibrium between two series for multivariate modelling
Ø Cointegrated (non-stationary) series with a unit root can be modelled by an error
correction model to study their short-run and long-run relationship
• Error correction model (ECM): the error correction model uses a combination of first differenced
and lagged levels of cointegrated variables
Δ𝑦# = 𝛽G + 𝛽; Δ𝑥# + 𝛽* (𝑦#:; − 𝛾𝑥#:; ) + 𝜀#
o 𝛾 is the cointegrating coefficient and describes the long-run relationship between 𝑦# and
𝑥# , while 𝛽; describes the short-run relationship between 𝑦# and 𝑥#
o 𝛽* describes the speed of adjustment back to equilibirum, i.e. it measures the proportion
of last period’s equilibrium error that is corrected for
Ø OLS and standard procedures for statistical interference can be used

17
6. Conditional Volatility Models: GARCH and SV models
6.1 ARCH and GARCH models
• The standard tools for time-series analysis are all concerned with whether there is a linear
relationship (or correlation) between the value of the series at one point in time and its value at
other points in time
o If there is no evidence of these relationships, the observations on the series are (linearly)
uncorrelated, however, the observations can still be normally distributed, non-linearly
related to one another (not independent), or fat-tailed
• For stock market data, levels of series are often hard to estimate, while turbulent and non-
turbulent times in terms of stock returns can often be captured using time varying volatility
models
o ARMA models are linear in mean and variance: these linear models have constant
conditional variance and thus cannot explain time varying volatility
Ø ARCH(q) and GARCH(p,q) models are linear in mean and non-linear in variance: these
non-linear models can explain time varying volatility
• ARCH(q) and GARCH(p,q) models define deterministic changes in conditional volatility and
explicitly model the short-run variance, i.e. the conditonal volatility at a specific time 𝑡
o Basic equation and error term for ARCH (q) and GARCH (p,q) models
𝑦# = 𝜀#
𝜀# = 𝑣# ℎ# , 𝑣# ~NID(0,1)
o General conditional variance ℎ# , which is known given information at time 𝑡 − 1
ℎ# = 𝑉𝑎𝑟 𝑦# 𝐼#:; = 𝐸 𝜀#* 𝐼#:;
o Conditional variance of an ARCH(q) model
* *
ℎ# = 𝛼G + 𝛼; 𝜀#:; + ⋯ + 𝛼f 𝜀#:f
o Conditional variance of a GARCH(q) model:
* *
ℎ# = 𝛼G + 𝛼; 𝜀#:; + ⋯ + 𝛼f 𝜀#:f + 𝛽; ℎ#:; + ⋯ + 𝛽b ℎ#:b
Ø Non-negativity constraint: ℎ# must always be strictly positive
Ø The volatility is required to be stationary
• Limitations of ARCH(q): the ARCH(q) model does only allow for q previous lags of the squared
error, but the number of lags required to capture all of the conditional dependence in the
conditional variance may be large
Ø The ARCH(q) model is more likely to violate non-negativity constraints
Ø The ARCH(q) model is less parsimonious than the GARCH(1,1)
• ARCH(1) model
o Conditional variance (short-run variance)
*
ℎ# = 𝛼G + 𝛼; 𝜀#:;

18
6.1.1 GARCH(1,1) models
• The GARCH(1,1) model can capture (partly) leptokurtosis and volatility clustering but cannot
capture leverage effects or volatility affecting returns
o Leptokurtosis: the tendency for financial asset returns to have distributions that exhibit
fat tails and excess peakedness at the mean
o Volatility clustering: the tendency for volatility in financial markets to occur in bursts, i.e.
the tendency of large changes to follow large changes and small changes to follow small
changes
o Leverage effects: the tendency for volatility to rise more following a large price fall than
following a price rise of the same magnitude
o Volatility affecting returns can only be estimated when the return equation is extended
by the mean effect of volatility 𝛾𝜎#:;
• GARCH(1,1) model
*
ℎ# = 𝐸 𝜀#* 𝐼#:; = 𝛼G + 𝛼; 𝜀#:; + 𝛽; ℎ#:;
§ Parameter restrictions for ℎ# to be finite and positive: sufficient conditions
𝛼G > 0, 𝛼; > 0, 𝛽; > 1
o Unconditional variance (long-run variance): expectation of variance
𝐸(𝑣𝑎𝑟(𝑦# )) = 𝐸 𝑣𝑎𝑟 𝑣# ℎ# = 𝐸 ℎ# 𝑣𝑎𝑟 𝑣# = 𝐸 ℎ# ×1 = 𝐸(ℎ# )

*
𝐸(ℎ# ) = 𝐸(𝛼G + 𝛼; 𝜀#:; + 𝛽; ℎ#:; )
*
𝐸(ℎ# ) = 𝛼G + 𝛼; 𝐸(𝜀#:; ) + 𝛽; 𝐸(ℎ#:; )
*
𝐸(ℎ# ) = 𝛼G + 𝛼; 𝐸(𝑣#:; ℎ#:; ) + 𝛽; 𝐸(ℎ#:; )
𝐸(ℎ# ) = 𝛼G + 𝛼; 𝐸(ℎ#:; ) + 𝛽; 𝐸(ℎ#:; )
𝛼G
𝐸(ℎ# ) = 𝐸(ℎ#:; ) =
1 − 𝛼; − 𝛽;
•‚
§ Parameter restrictions for to be finite and positive: necessary conditions
;:•ƒ :„ƒ
𝛼; + 𝛽; < 0, 𝛼G > 0
Ø The parameter restrictions for short-run variance and long-run variance ensure positive
variance at each estimation period and forecast period, so that the non-negativity
constraint is not violated
• GARCH(1,1) compared to ARCH(q): the GARCH(1,1) can be written as a restricted infinite order
ARCH model, hence the GARCH(1,1) model with only three paramaters in the conditional variance
equation is very parsimonious
Ø Therefore, the GARCH(1,1) will usually be sufficient to capture all of the dependence in
the conditional variance
• GARCH(1,1) compared to ARCH(1)
o The GARCH(1,1) model has three parameters, whereas the ARCH(1) model has only two
parameters, so it is more likely parsimonious than the GARCH(1,1) model
Ø When having no information on the explanatory power of the models, we only
look at the number of parameters
19
• Estimation of ARCH and GARCH
o Specify the mean equation and the variance equation
o Quasi-maximum likelihood estimation: specify a log-likelihood function to find the
maximizing parameter values

6.1.2 Extensions of GARCH models

• If the estimated coefficient values from a standard GARCH model are negative, then the non-
negativity condition is violated; a negative conditional variance is meaningless
o The EGARCH model gets around this by using a logarithmic conditional variance equation
such that, even if the estimated coefficients in the EGARCH model are negative, the
conditional variance will always be positive, so we do not have to artificially impose non-
negativity constraints
• While the standard GARCH model can only capture symmetries in the response of volatility to
positive or negative returns (as the sign is lost due to squaring), the EGARCH and TARCH model
can capture asymmetries in the response of volatility to positive or negative returns, such as
leverage effects
• EGARCH(1,1,1) model
𝜀#:; 𝜀#:;
ℎ# = 𝐸 𝜀#* 𝐼#:; = exp 𝛼G + 𝛼; + 𝜆; 𝛽; log (ℎ#:; )
ℎ#:; ℎ#:;
Ø The exponent implies that is always positive, no matter the parameter values
• TARCH(1,1,1) model
*
ℎ# = 𝐸 𝜀#* 𝐼#:; = 𝛼G + 𝛼; 𝐼 𝜀#:; ≥ 0 𝜀#:; *
+ (𝛼; + 𝜆; )𝐼 𝜀#:; < 0 𝜀#:; + 𝛽; ℎ#:;

6.2 Stochastic volatility models

• Stochastic volatility models define stochastic changes in conditional volatility
o General: conditional volatiliy models can be motivated by economic theory: in the Black-
Scholes model, volatility is assumed to be constant through time, so the unappealing side-
effect is that options deep in-the-money and far out-of-the-money are underpriced
relative to actual traded price
o SV models do not require data to be stationary as is the case for GARCH models
o SV models can be used for univariate and multivariate time series analysis and for
analyzing fat tailed distributions
Ø The volatility can be non-stationary (whereas in GARCH models, it is required to be
stationary)

20
7. Stylized Facts of Financial Asset Prices and Returns
• There are different sources and types of non-normality in return distributions
o In certain cases, normal distribution cannot be assumed (CLT)
• Unit roots and non-stationarity: (the logarithm of) the price of a financial asset is non-stationary
while the first difference is stationary, i.e. returns are stationary
o Dealing with unit roots (DF test, differencing the data)
o ARIMA models to model the non-stationary data instead of differencing
o Cointegration models: if multiple series have unit roots but they are believed to move
together (EMH)

Comment by Nalan Bastürk
“In an AR(1) model for returns, EMH implies that returns are not predictable with their past. For
example, if beta = 0.2, return today increases with the return yesterday. When we see a positive
return in the market, we can also expect tomorrow's return to be high. This would be against EMH.
Under EMH, beta should be 0.”

• Calendar effects: the time of trading has effects on conditional means and variances
o Introduction of dummy variables for “suspected” differences
• Excess volatility: price volatility is excessive relative to fundamental volatility according to the
efficient market hypothesis
o Volatility cannot be totally explained by the EMH as not enough is known about the
variation of the discount rate
• Fat tails: empirical (unconditional) distributions for asset prices and returns have found that
extreme values are more likely than would be predicted by the normal distribution
o There are typically fat tails, extreme values, extreme kurtosis or skewness in returns
o Excess volatility often arises together with volatility clustering, so that GARCH and SV
models can be used
Ø When GARCH and SV models are not sufficient to model all tail behavior, fat-
tailed error distributions can be defined within these models
o Focus on modeling tails: extreme value analysis (EVA) avoids the problem of tail
additivity associated with other distributions or conditional volatility models, i.e. the VaR
for the following 10 periods of these models cannot be simply added up
• Volatility clustering: asset price and return series exhibit volatility clusters
o GARCH and SV models to capture clustering
o Extensions of extreme value analysis to model volatility clustering
• Exchange rates: the properties of parities/no arbitrage conditions, volatility and trade, and
absence of fundamentals have to be separately understood to explicitly choose a model
o Exchange rates have unit roots
o Link between CIP and UIP, which both contain a unit root: there has to exist a
cointegration factor for CIP and UIP for the EMH for foreign exchange to hold
21

Time Series Analysis and Forecasting Techniques

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Time Series Analysis and Forecasting Techniques

Transféré par

Droits d'auteur :

Formats disponibles

Quantitative

Techniques for Financial Economics: Time Series

School of Business and Economics

1.1 Basics: hypotheses testing

1.2 Basics: goodness of fit

1.2 Trends and seasonality

2.2 AR(p) model

2.3 MA(q) model

2.4 ARMA(p,q) model

2.5 Autocorrelation and partial autocorrelation function

𝑓#,/ = 𝑎a 𝑓#,/:a + 𝑏e 𝑓#_/:e

o Assumption: parameter stability over time

2.6.1 Forecast performance

2.7 Model choice

2.7.1 Information criteria for lag length selection

2.7.2 Forecast performance

3.1 Deterministic trend

3.2 Stochastic trend: random walk without and with drift

3.3 Testing for unit roots

3.3.1 Augmented Dickey-Fuller test

3.3.2 Other unit root tests

o There is one equation for each variable as dependent variable

4.3 Impulse response functions

5.2 Testing cointegration

5.3 Vector error correction model

𝐸(𝑣𝑎𝑟(𝑦# )) = 𝐸 𝑣𝑎𝑟 𝑣# ℎ# = 𝐸 ℎ# 𝑣𝑎𝑟 𝑣# = 𝐸 ℎ# ×1 = 𝐸(ℎ# )

6.1.2 Extensions of GARCH models

6.2 Stochastic volatility models

Vous aimerez peut-être aussi