Académique Documents
Professionnel Documents
Culture Documents
1
(BLUE), i.e. they no longer have the minimum variance among the class of unbiased
estimators
• Omitted variable bias: when omitting a variable from the model, the estimated coefficients on all
the other variables will be biased and inconsistent unless the excluded variable is uncorrelated
with all the included variables, i.e. the variable is irrelevant which leads to unbiased but
inefficient coefficient estimates (not BLUE)
Ø Even if this condition is satisfied, the estimate of the coefficient on the constant term will
be biased, which would imply that any forecasts made from the model would be biased,
leading to standard errors being biased upwards, so hypothesis tests could yield
inappropriate inferences
2
• Adjusted 𝑅 * : modification to take into account the loss of degrees of freedom associated with
adding extra variables, so include the variable if 𝑅 * rises and do not include it if 𝑅 * falls
o Soft criterion for model selection as it typically selcts the largest model of all
3
2 Stationary Models: Univariate Time Series
2.1 General
• Covariance stationary: a time series is stationary if its mean, variance and autocovariances are
constant over time, i.e. they do not depend on time t
𝐸 𝑦# = 𝜇
*
𝐸 𝑦# − 𝜇 = 𝜎 *
𝐸 𝑦# − 𝜇 𝑦#:P − 𝜇 = 𝛾P
o Stationarity is a necessary condition to avoid spurious regressions and to enable the
analysis of time series data and forecasting
• Innovation process 𝜀# : unpredictable movements in 𝑦
• White noise process 𝜀# ~WN 0,1 : a white noise process has constant mean and variance, and zero
autocorrelations
𝐸 𝜀# = 0
𝐸 𝜀#* = 𝜎 *
𝐸 𝜀# 𝜀/ = 0 𝑓𝑜𝑟 𝑠 ≠ 𝑡
• The stationary models, AR(p), MA(q) and ARMA(p,q), are linear regression models, so CLRM
assumptions apply
Ø Stationarity condition: an AR(p) process is stationary if and only if 𝜙 < 1 for all 𝑘 =
1, … , 𝑝, i.e. all the solutions of 𝜙 𝑧 = 0 should lie outside the unit circle
Ø Wold’s decomposition theorem: an AR(p) process with no constant term or other terms
can always be written in terms of an MA(¥)process
4
• AR(1) model
o Unconditional mean, unconditional variance and derivation of ACF for an AR(1) model
[See notes and Brooks (2008), pp 218-222]
o Properties: constant mean, constant variance, and non-zero autocorrelation to lag q and
will always be zero thereafter
Ø Invertibility condition: the MA(q) model is invertible if and only if 𝜃 < 1, i.e. all the
solutions of the equation 𝜃 𝑧 = 0 should lie outside the unit circle, and then can be
expressed as an AR(p) model
Ø Stationarity condition: an ARMA(p,q) process is stationary if and only if 𝜙 < 1 for all
𝑘 = 1, … , 𝑝, i.e. all the solutions of 𝜙 𝑧 = 0 should lie outside the unit circle
Ø Invertibility condition: an ARMA(p,q) process is invertible if and only if 𝜃 < 1, i.e. all the
solutions to 𝜃 𝑧 = 0 lie outside the unit circle, and then can be expressed as a pure
AR(p) modell
• ARIMA(p,d,q) process: an ARMA(p,q) process differenced 𝑑 times
5
• How can an ARMA(1,1) and AR(1) model be compared in terms of in-sample fit?
o 𝑅 * cannot be used as the models do not have the same amount of parameters
o Information criteria comparisons can be performed as the models have the same
dependent variable and an intercept term
o Misspecification tests and autocorrelation tests can be used to check if one of the models
does not satisfy the CLRM assumptions
• What are the consequences of a unit root in the AR part or in the MA part of an ARMA(p,q) model?
o If the process has a unit root in the AR part it would be non-stationary and exhibit some
random walk behavior: although the parameters could be still estimated e.g. by least
squares, the asymptotic distribution of the estimators would be non-standard and affect
confidence intervals and critical values of tests
o A unit root in the MA part, on the other side, has serious consequences for the estimation
of the parameters as the prediction errors are no longer equal to the innovations 𝜀#
6
2.6 Forecasting
• A model will be estimated only using the data from the in-sample period, which is then used to
forecast the values of the out-of-sample period to assess whether they are close to the actual
values
Ø The in-sample fit relates to the goodness of the estimation, i.e. the estimation of the in-
sample data, while the out-of-sample fit relates to the goodness of the forecast
• Conditional expectation: the expected value of 𝑦 is taken for 𝑡 + 𝑠, given the all information
available up to and including time 𝑡, i.e. the values 𝑦 and of the error terms up to time 𝑡 are
known, so that 𝑢#_/ = 0 for 𝑠 > 0, so that E(𝑢#_/ # ) = 0
𝐸(𝑦#_/ Ω# )
• Forecasting with ARMA (p,q) models: the forecast function generates the forecasts
b f
• Forecasting the future value of an MA(q) process: as a moving average process has only a memory
of length 𝑞, the forecasting horizon is limited, so that all forecasts of more than 𝑞 steps ahead
collapse to the intercept
o Example for an MA(3) process
• Forecasting the future value of an AR(p) process: unlike a moving average process, an
autoregressive process has an infinite memory
o Example for an AR(2) process
o The s-step ahead forecast for an AR(2) process is given by the intercept plus the
coefficient on the one-period lag multiplied by the time 𝑠 − 1 forecast plus the coefficient
on the two-period lag multiplied by the 𝑠 − 2 forecast
𝑓#,/ = 𝜇 + 𝜙; 𝑓#,/:; + 𝜙* 𝑓#,/:*
8
o Stage 2: estimation of the parameters of the model
§ Estimate the model: AR(p) models can be estimated using ordinary least squares
(OLS), while MA(q) and ARMA(p,q) models can be estimated using non-linear
least squares (NLS) or maximum likelihood (ML)
§ Check for significance of coefficients
o Stage 3: diagnostic checking of the model
§ Residual diagnostics: check whether the residuals are approximately a white
noise process, i.e. whether they are not linearly dependent and hence have
(partial) autocorrelations equal to 0
§ Check information criteria for model choice
Ø Models can be compared by checking the significance of coefficients, by residual
diagnostics, and by comparing information criteria: focus on information criteria
comparisons
Ø For the purpose of only forecasting, models are compared by using forecast performance
measures
o Total number of parameters 𝑘 = 𝑝 + 𝑞 + 1 in the selected model
Ø Information criteria give a single number per model which enables the comparison of
several models at once in order to select the appropriate lag length
• Example of information criteria for ARMA models: AIC selects an ARMA(4,4) model, while SBIC
selects an ARMA(2,0) model
• For VAR(p) models, there exist multivariate versions of the information criteria
10
3. Non-stationary Models: General
• A model whose coefficients are non-stationary will exhibit the unfortunate property that values of
the error term will have a non-declining effect on the current value of 𝑦# as time progresses
Ø Any shock, i.e. a change or an unexpected change in a variable or the value of the error
term, will be persistent
• Consider the general model
𝑦# = 𝛼 + 𝛽𝑡 + 𝜙𝑦#:; + 𝜀#
11
𝑦# = 𝛼 + 𝛽𝑡 + 𝑦#:; + 𝜀#
o The differenced series Δ𝑦# = 𝛼 + 𝛽𝑡 + 𝜀# is still time-varying and thus has to be
detrended to make it stationary
• CAPM model
𝑟# = 𝑟q + 𝛽 𝑟#r − 𝑟q + 𝜀#
𝑆# − 𝑆#:;
= 𝑟q − 𝛽𝑟q + 𝛽𝑟#r + 𝜀#
𝑆#:;
Δ𝑆# = 1 − 𝛽 𝑟q + 𝛽𝑟#r + 𝜀#
Δ𝑆# = 𝛼 + 𝛽𝑡 + 𝜀#
Comment by Nalan Bastürk
“First, a random walk process (with or without a drift) does not have a constant mean. To be more
precise, the mean is not finite. Stock prices are good examples of this. When there is a shock in these
prices, the effect of this shock is permanent, i.e. the whole evolution of prices is affected by this shock.
This explanation is in line with the data being non-stationary. I think what you mean by a 'pure trend'
model is the deterministic trend case. With deterministic trends, there is an overall direction in the
data, such as a positive trend. The data can be away from this trend for a short period but then it goes
back to the trend line. Another way of calling such deterministic trends is 'trend stationarity'. The
shocks in this case die out, i.e. the data moves towards the deterministic trend. It is important that
even in case of deterministic trends there is no constant mean.”
“When the first difference of data is trend-stationary, we can use stationary time series models for
the data (such as a linear regression or an AR model) but the model should include a trend term to
capture deterministic trends/trend stationarity.”
12
o Test hypotheses with trend 𝛽: if 𝐻G is not rejected, the series has a unit root and is a
random walk (possibly with drift); if 𝐻G is rejected, the series has no unit root and a
deterministic trend, i.e. the series is trend-stationary
𝐻G : 𝜌 = 𝛽 = 0
𝐻; : 𝜌 < 0 𝑎𝑛𝑑 𝛽 ≠ 0
• Augmented Dickey-Fuller (ADF) test: the basic test equation is extended by lagged values of Δ𝑦#
to account for remaining autocorrelation in error terms, where the lag length p can be specified
using information criteria comparisons
Δ𝑦# = 𝛼 + 𝛽𝑡 + 𝜌𝑦#:; + 𝜌; Δ𝑦#:; +. . . +𝜌b:; Δ𝑦#:b:; + 𝜀#
o Assumption: as the white noise assumption for 𝜀# may be violated, lagged values are
added to eliminate the serial correlation
o Test hypotheses with drift and trend: if 𝐻G is not rejected, the series has a unit root and a
stochastic trend; if 𝐻G is rejected, the series has a deterministic trend
𝐻G : 𝜌 = 0 𝑎𝑛𝑑 𝛽 = 0
𝐻; : 𝜌 < 0 𝑎𝑛𝑑 𝛽 ≠ 0
• DF critical values are more negative than under the standard 𝑡-distribution due the null
hypothesis of non-stationarity, i.e. more evidence is needed against the null hypothesis in the
context of unit roots
o The null hypothesis is rejected in favor of the stationary alternative if the test statistic is
more negative than the critical value
13
4. Stationary Multivariate Models
4.1 VAR(p) models
• Vector autoregressive VAR(p) model: series specifically used to model inter-dependencies or
dynamic correlations between economic time series
y
𝑦# = 𝐴G + 𝐴P 𝑦#:P + 𝜀#
Pc;
o General structural VAR
• If the VAR model includes contemporaneous terms, they can be taken over to the LHS and then
both sides of the equation can be pre-multiplied by the inverse of the matrix on the LHS in order
to get the standard form VAR
o General structural VAR
o Contemporaneous terms taken over to the LHS and both sides being pre-multiplied
o Standard form VAR
14
4.2 VARX(p) models
• VARX(p) model: the VAR(p) model can be augmented with a vector of exogenous variables 𝑋#
times a matrix of coefficients 𝐵 (e.g. by adding a deterministic trend)
y
𝑦# = 𝐴G + 𝐴P 𝑦#:P + 𝐵𝑋# + 𝜀#
Pc;
o The value of 𝑋# are determined outside of the VAR system, i.e. there are no equations in
the VAR with dependent variables
15
5. Non-stationary multivariate models: cointegration and vector error correction
5.1 Cointegration
• If variables with differing orders of integration are combined, the combination will have an order
of integration equal to the largest, so for variables that are 𝐼 1 combined, their linear
combination will also be 𝐼 1
• Cointegration relationships in finance
o Spot and futures prices for a given commodity or asset
o Ratio of relative prices and an exchange rate
o Equity prices and dividends
Ø Market forces arising from no-arbitrage conditions suggest that there should be an
equilibrium relationship between the series concerned
• Criteria for cointegration between non-stationary series
o The series are integrated of the same order 𝑑, i.e. they are all 𝐼(𝑑)
o There exists a linear combination of the variables which is integrated of lower order,
such that they are 𝐼(𝑑 − 𝑏)
Ø Many time series are non-stationary but move together over time, i.e. a cointegrating
relationship can be seen as a long term, or equilibrium relationship
• Many financial variables contain a unit root and are thus 𝐼 1 : series that are 𝐼 1 will be
cointegrated, if there exists a cointegrating vector 𝛽 such that the linear combination of the series
is 𝐼 0 (stationary), i.e. 𝜀# is 𝐼 0
Ø Regression between 𝐼 1 variables is only valid when they are cointegrated
• The system of equations consists of three variables 𝑦# = 𝑥;# , 𝑥*# , 𝑥}# ′, where the error terms are
uncorrelated white noise processes, i.e. they are stationary
o As 𝑥}# contains a unit root, 𝑥;# and 𝑥*# both also contain a unit root, so they are all 𝐼 1 : a
system of 𝐼 1 variable will itself be 𝐼 1 and will be 𝐼 0 if the variables are cointgrated
o Cointegrating vector for 𝑥*# and 𝑥}#
𝛽 = 0,1, −𝛽}
o Cointegrating vector for 𝑥;# , 𝑥*# , and 𝑥}#
𝛽 = 1, −𝛽; , −𝛽*
16
o Step 2: Estimate the cointegrating regression using OLS and test whether the residuals
are 𝐼 0 , i.e. test for cointegration
§ Test hypotheses: if 𝐻G is not rejected, there is no cointegration and the
appropriate strategy is to use first differencing since there is no long-run
relationship; if 𝐻G is rejected, there is cointegration and the appropriate strategy
is to form an estimate an error correction model
𝐻G : 𝜀# ~𝐼 1
𝐻; : 𝜀# ~𝐼 0
o Step 3: Form and estimate an error correction model (ECM)
17
6. Conditional Volatility Models: GARCH and SV models
6.1 ARCH and GARCH models
• The standard tools for time-series analysis are all concerned with whether there is a linear
relationship (or correlation) between the value of the series at one point in time and its value at
other points in time
o If there is no evidence of these relationships, the observations on the series are (linearly)
uncorrelated, however, the observations can still be normally distributed, non-linearly
related to one another (not independent), or fat-tailed
• For stock market data, levels of series are often hard to estimate, while turbulent and non-
turbulent times in terms of stock returns can often be captured using time varying volatility
models
o ARMA models are linear in mean and variance: these linear models have constant
conditional variance and thus cannot explain time varying volatility
Ø ARCH(q) and GARCH(p,q) models are linear in mean and non-linear in variance: these
non-linear models can explain time varying volatility
• ARCH(q) and GARCH(p,q) models define deterministic changes in conditional volatility and
explicitly model the short-run variance, i.e. the conditonal volatility at a specific time 𝑡
o Basic equation and error term for ARCH (q) and GARCH (p,q) models
𝑦# = 𝜀#
𝜀# = 𝑣# ℎ# , 𝑣# ~NID(0,1)
o General conditional variance ℎ# , which is known given information at time 𝑡 − 1
ℎ# = 𝑉𝑎𝑟 𝑦# 𝐼#:; = 𝐸 𝜀#* 𝐼#:;
o Conditional variance of an ARCH(q) model
* *
ℎ# = 𝛼G + 𝛼; 𝜀#:; + ⋯ + 𝛼f 𝜀#:f
o Conditional variance of a GARCH(q) model:
* *
ℎ# = 𝛼G + 𝛼; 𝜀#:; + ⋯ + 𝛼f 𝜀#:f + 𝛽; ℎ#:; + ⋯ + 𝛽b ℎ#:b
Ø Non-negativity constraint: ℎ# must always be strictly positive
Ø The volatility is required to be stationary
• Limitations of ARCH(q): the ARCH(q) model does only allow for q previous lags of the squared
error, but the number of lags required to capture all of the conditional dependence in the
conditional variance may be large
Ø The ARCH(q) model is more likely to violate non-negativity constraints
Ø The ARCH(q) model is less parsimonious than the GARCH(1,1)
• ARCH(1) model
o Conditional variance (short-run variance)
*
ℎ# = 𝛼G + 𝛼; 𝜀#:;
18
6.1.1 GARCH(1,1) models
• The GARCH(1,1) model can capture (partly) leptokurtosis and volatility clustering but cannot
capture leverage effects or volatility affecting returns
o Leptokurtosis: the tendency for financial asset returns to have distributions that exhibit
fat tails and excess peakedness at the mean
o Volatility clustering: the tendency for volatility in financial markets to occur in bursts, i.e.
the tendency of large changes to follow large changes and small changes to follow small
changes
o Leverage effects: the tendency for volatility to rise more following a large price fall than
following a price rise of the same magnitude
o Volatility affecting returns can only be estimated when the return equation is extended
by the mean effect of volatility 𝛾𝜎#:;
• GARCH(1,1) model
o Conditional variance (short-run variance)
*
ℎ# = 𝐸 𝜀#* 𝐼#:; = 𝛼G + 𝛼; 𝜀#:; + 𝛽; ℎ#:;
§ Parameter restrictions for ℎ# to be finite and positive: sufficient conditions
𝛼G > 0, 𝛼; > 0, 𝛽; > 1
o Unconditional variance (long-run variance): expectation of variance
𝛼; + 𝛽; < 0, 𝛼G > 0
Ø The parameter restrictions for short-run variance and long-run variance ensure positive
variance at each estimation period and forecast period, so that the non-negativity
constraint is not violated
• GARCH(1,1) compared to ARCH(q): the GARCH(1,1) can be written as a restricted infinite order
ARCH model, hence the GARCH(1,1) model with only three paramaters in the conditional variance
equation is very parsimonious
Ø Therefore, the GARCH(1,1) will usually be sufficient to capture all of the dependence in
the conditional variance
• GARCH(1,1) compared to ARCH(1)
o The GARCH(1,1) model has three parameters, whereas the ARCH(1) model has only two
parameters, so it is more likely parsimonious than the GARCH(1,1) model
Ø When having no information on the explanatory power of the models, we only
look at the number of parameters
19
• Estimation of ARCH and GARCH
o Specify the mean equation and the variance equation
o Quasi-maximum likelihood estimation: specify a log-likelihood function to find the
maximizing parameter values
20
7. Stylized Facts of Financial Asset Prices and Returns
• There are different sources and types of non-normality in return distributions
o In certain cases, normal distribution cannot be assumed (CLT)
• Unit roots and non-stationarity: (the logarithm of) the price of a financial asset is non-stationary
while the first difference is stationary, i.e. returns are stationary
o Dealing with unit roots (DF test, differencing the data)
o ARIMA models to model the non-stationary data instead of differencing
o Cointegration models: if multiple series have unit roots but they are believed to move
together (EMH)
Comment by Nalan Bastürk
“In an AR(1) model for returns, EMH implies that returns are not predictable with their past. For
example, if beta = 0.2, return today increases with the return yesterday. When we see a positive
return in the market, we can also expect tomorrow's return to be high. This would be against EMH.
Under EMH, beta should be 0.”
• Calendar effects: the time of trading has effects on conditional means and variances
o Introduction of dummy variables for “suspected” differences
• Excess volatility: price volatility is excessive relative to fundamental volatility according to the
efficient market hypothesis
o Volatility cannot be totally explained by the EMH as not enough is known about the
variation of the discount rate
• Fat tails: empirical (unconditional) distributions for asset prices and returns have found that
extreme values are more likely than would be predicted by the normal distribution
o There are typically fat tails, extreme values, extreme kurtosis or skewness in returns
o Excess volatility often arises together with volatility clustering, so that GARCH and SV
models can be used
Ø When GARCH and SV models are not sufficient to model all tail behavior, fat-
tailed error distributions can be defined within these models
o Focus on modeling tails: extreme value analysis (EVA) avoids the problem of tail
additivity associated with other distributions or conditional volatility models, i.e. the VaR
for the following 10 periods of these models cannot be simply added up
• Volatility clustering: asset price and return series exhibit volatility clusters
o GARCH and SV models to capture clustering
o Extensions of extreme value analysis to model volatility clustering
• Exchange rates: the properties of parities/no arbitrage conditions, volatility and trade, and
absence of fundamentals have to be separately understood to explicitly choose a model
o Exchange rates have unit roots
o Link between CIP and UIP, which both contain a unit root: there has to exist a
cointegration factor for CIP and UIP for the EMH for foreign exchange to hold
21