Académique Documents
Professionnel Documents
Culture Documents
Francis X. Diebold
University of Pennsylvania
All materials are freely available for your use, but be warned: they are highly
preliminary, significantly incomplete, and rapidly evolving. All are licensed
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
International License. (Briefly: I retain copyright, but you can use, copy and
distribute non-commercially, so long as you give me attribution and do not
modify. To view a copy of the license, visit
http://creativecommons.org/licenses/by-nc-nd/4.0/.) In return I ask that you
please cite the books whenever appropriate, as: Diebold, F.X. (year here),
Book Title Here, Department of Economics, University of Pennsylvania,
http://www.ssc.upenn.edu/ fdiebold/Textbooks.html.
2 / 247
Introduction to Econometrics
3 / 247
Numerous Communities Use Econometrics
4 / 247
Econometrics is Special
5 / 247
Lets Elaborate on the Emphasis on Prediction...
Q: What is econometrics is about, broadly?
6 / 247
There are Many Issues Regarding Types of Recorded
Economic Data
I Time series
I Continuous recording
I Discrete recording
I Equally-spaced
I Unequally-spaced
I Common-frequency
I Mixed-frequency
I Cross section
I Time series of cross sections
I Balanced panel
I Unbalanced panel
7 / 247
Notational Aside
8 / 247
A Few Leading Econometrics Web Data Resources
(Clickable)
Indispensible:
I Resources for Economists (AEA)
More specialized:
I National Bureau of Economic Research
I Many more
9 / 247
A Few Leading Econometrics Software Environments
(Clickable)
10 / 247
Graphics Review
11 / 247
Graphics Help us to:
12 / 247
Histogram Revealing Distributional Shape:
1-Year Government Bond Yield
13 / 247
Time Series Plot Revealing Dynamics:
1-Year Goverment Bond Yield, Levels
14 / 247
Scatterplot Revealing Relationship:
1-Year and 10-Year Government Bond Yields
15 / 247
Some Principles of Graphical Style
16 / 247
Probability and Statistics Review
17 / 247
Moments, Sample Moments and Their Sampling
Distributions
18 / 247
Population Moments: Expectations of Powers of R.V.s
2 = var (y ) = E (y )2 .
19 / 247
More Population Moments
E (y )3
S= .
3
E (y )4
K= .
4
20 / 247
Covariance and Correlation
So does correlation:
cov (y , x)
corr (y , x) = .
y x
21 / 247
Sampling and Estimation
Sample : {yt }T
t=1 iid f (y )
Sample mean:
T
1 X
y = yt
T
t=1
Sample variance:
PT
2 (yt y )2
= t=1
T
Unbiased sample variance:
PT
2 (yt y )2
s = t=1
T 1
22 / 247
More Sample Moments
Sample skewness:
1 PT
T t=1 (yt y )3
S =
3
Sample kurtosis:
1 PT
T t=1 (yt y )4
K =
4
23 / 247
Still More Sample Moments
Sample covariance:
T
1 X
cd
ov (y , x) = [(yt y )(xt x)]
T
t=1
Sample correlation:
cd
ov (y , x)
corr
d (y , x) =
y x
24 / 247
Exact Finite-Sample Distribution of the Sample Mean
(Requires iid Normality)
2
y N ,
T
(and we estimate 2 consistently using s 2 )
s
y t1 2 (T 1) w .p.
T
y 0
= 0 = s t(T 1)
T
25 / 247
Large-Sample Distribution of the Sample Mean
(Requires iid, but not Normality)
27 / 247
Wages: Sample Statistics
28 / 247
Regression
29 / 247
Regression
30 / 247
Regression as Curve Fitting
31 / 247
Distributions of Log Wage, Education and Experience
32 / 247
Scatterplot: Log Wage vs. Education
33 / 247
Curve Fitting
Fit a line:
yt = 1 + 2 xt
Solve:
T
X
min (yt 1 2 xt )2
t=1
34 / 247
Log Wage vs. Education with Superimposed Regression
Line
\ = 1.273 + .081EDUC
LWAGE
35 / 247
Actual Values, Fitted Values and Residuals
yt = 1 + 2 xt ,
t = 1, ..., T .
The residuals are the difference between actual and fitted values,
et = yt yt ,
t = 1, ..., T .
36 / 247
Multiple Linear Regression (K RHS Variables)
Solve:
T
X
min (yt 1 2 x2t ... K xKt )2
t=1
Fitted hyperplane:
yt = 1 + 2 x2t + 3 x3t + ... + K xKt
More compactly:
K
X
yt = i xit ,
i=1
Wage dataset:
\ = .867 + .093EDUC + .013EXPER
LWAGE
37 / 247
Regression as a Probability Model
38 / 247
An Ideal Situation (The Ideal Conditions)
I The data-generating process (DGP) is:
y1 1 x21 x31 ... xK 1 1 1
y2 1 x22 x32 ... xK 2 2 2
y = . X = . = .. = ..
.. .. . .
yT 1 x2T x3T ... xKT K T
40 / 247
Elementary Matrices and Matrix Operations
0 0 ... 0 1 0 ... 0
0 0 . . . 0 0 1 . . . 0
0 = . . . I = . . .
. . . .. . . . ..
. . . . . . . .
0 0 ... 0 0 0 ... 1
41 / 247
We Used to Write This:
42 / 247
Now, Equivalently, We Write This:
y = X + (1)
N(0, 2 I ) (2)
independent of X
y1 1 x21 x31 . . . xK 1 1 1
y2 1 x22 x32 . . . xK 2 2 2
.. = .. .. + .. (1)
. . . .
yT 1 x2T x3T . . . xKT K T
2
1 01 0 ... 0
2 02 0 2 . . . 0
N . , . .. (2)
.. . . .. . .
. . . . . .
T 0T 0 0 . . . 2
43 / 247
The OLS Estimator in Matrix Notation:
LS = (X 0 X )1 X 0 y
44 / 247
The Ideal Conditions, Redux
45 / 247
Sampling Distribution of LS Under the Ideal Conditions
LS N (, V ) .
Note the precise parallel with the distribution of the sample mean
in Gaussian iid environments.
46 / 247
Sampling Distribution of LS Under the Ideal Conditions
Less Normality
As T ,
LS N (, V ) .
Note the precise parallel with the distribution of the sample mean
in non-Gaussian iid environments.
47 / 247
Sample Mean (You Earlier Learned All About it)
OLS Regression (Youre Now Learning All About it)
What is the Relationship?
Sample mean is just LS regression on nothing but a constant.
(Prove it.)
Conditional mean:
E (yt | x1t = 1, x2t = x2t , ..., xKt = xKt ) = 1 + 2 x2t + ... + K xKt
or E (yt | xt = xt ) = xt 0
Conditional variance:
var (yt | xt = xt ) = 2
yt | xt = xt N(xt 0 , 2 )
49 / 247
Why All the Talk About Conditional Implications?:
The Predictive Modeling Problem
A major goal in econometrics is predicting y . The question is If a
new person arrives with characteristics x , what is my
minimum-MSE prediction of her y ? The answer under quadratic
loss is E (y |x = x ) = xt 0 .
Non-operational:
Operational:
51 / 247
Density Prediction
Non-operational version:
yt | xt = xt N(xt 0 , 2 )
Operational version:
yt | xt = xt N(xt 0 LS , s 2 )
52 / 247
Digging More Deeply into Prediction
yt = xt0 + t , t = 1, ..., T
t iid D(0, 2 )
53 / 247
Point Prediction
Assume for the moment that we know the model parameters. That
is, assume that we know and all parameters governing D. Note
that the mean and variance are in general insufficient to
characterize a non-Gaussian D.
E (yi |xi = x ) = x 0 .
54 / 247
Analytic Density Prediction (And Hence Also Interval
Prediction) for D Gaussian
55 / 247
Simulation Algorithm for Density Prediction for D Gaussian
56 / 247
Making the Forecasts Feasible
57 / 247
Typical Regression Analysis of Wages, Education and
Experience
58 / 247
Top Matter: Background Information
I Dependent variable
I Method
I Date
I Sample
I Included observations
59 / 247
Middle Matter: Estimated Regression Function
I Variable
I Coefficient
I Standard error
I t-statistic
I p-value
60 / 247
Predictive Perspectives
OLS coefficient signs and sizes give the weights put on the
various x variables in forming the best in-sample prediction of y .
61 / 247
Bottom Matter: Statistics
62 / 247
Regression Statistics: Mean dependent var 2.342
T
1 X
y = yt
T
t=1
63 / 247
Predictive Perspectives
64 / 247
Regression Statistics: S.D. dependent var .561
s
PT
t=1 (yt y )2
SD =
T 1
65 / 247
Predictive Perspectives
66 / 247
Regression Statistics: Sum squared resid 319.938
T
X
SSR = et2
t=1
67 / 247
Predictive Perspectives
68 / 247
Regression Statistics: F -statistic 199.626
(SSRres SSR)/(K 1)
F =
SSR/(T K )
69 / 247
Predictive Perspectives
70 / 247
Regression Statistics: S.E. of regression .492
PT 2
2 t=1 et
s =
T K
s
PT 2
t=1 et
SER = s2 =
T K
71 / 247
Predictive Perspectives
72 / 247
Regression Statistics: R-squared .232
PT 2
2 t=1 et
R = 1 PT
t=1 (yt y )2
73 / 247
Regression Statistics: Adjusted R-squared .231
1 PT 2
2 T K t=1 et
R = 1 1 T 2
P
T 1 t=1 (yt y )
74 / 247
Predictive Perspectives
75 / 247
Regression Statistics: Log likelihood -938.236
76 / 247
Background/Detail: Regression Statistics: Log likelihood
-938.236
77 / 247
Background/Detail: Maximum-Likelihood Estimation
yt iidN(xt0 , 2 ),
so that 1 0
1 2
f (yt ) = (2 2 ) 2 e 22 (yt xt ) .
Now by independence of the t s and hence yt s,
T
1 1 0 2
e 22 (yt xt )
Y
L = f (y1 , ..., yT ) = f (y1 ) f (yT ) = (2 2 ) 2
t=1
78 / 247
Background/Detail: Log Likelihood
T
T 1 X
ln L = (2 2 ) 2 (yt xt0 )2
2 2
t=1
- Log turns the product into a sum and eliminates the exponential
79 / 247
Background/Detail: Likelihood-Ratio Tests
2(ln L0 ln L1 ) 2d ,
80 / 247
Regression Statistics: Schwarz criterion 1.435
81 / 247
Regression Statistics: Durbin-Watson stat. 1.926
82 / 247
Residual Scatter
83 / 247
Residual Plot
84 / 247
Predictive Perspectives
85 / 247
Non-Quadratic Loss
86 / 247
We Will Generally Use Quadratic Loss...
Simple
(analytic closed-form expression, (X 0 X )1 X 0 y )
87 / 247
...But We Can Also Consider Non-Quadratic Loss
88 / 247
LAD Regression (Absolute-Error Loss)
89 / 247
Quantile Regression (LinLin Loss)
Loss is linear with potentially different slopes on each side of 0.
where:
a|e|, if e 0
check(e) =
b|e|, if e > 0
100d%(y |X ) = X
where
b 1
d= =
a+b 1 + a/b
92 / 247
Optimal Forecasts Can Be Biased
93 / 247
Quantile Regression (10th Percentile): LWAGE c,
EDUC
5
3
LW AGE
Fitted 10th Percentile
2
0
0 4 8 12 16 20 24
EDUC
\ = 0.799 + 0.068EDUC
LWAGE
94 / 247
Quantile Regression (90th Percentile): LWAGE c,
EDUC
5
3
LW AGE
Fitted 90th Percentile
2
0
0 4 8 12 16 20 24
EDUC
\ = 1.894 + 0.083EDUC
LWAGE
95 / 247
Comparison: LWAGE c, EDUC
5
3
LWAGE
0
0 4 8 12 16 20 24
EDUC
97 / 247
Regression Statistics: Schwarz criterion 1.435
PT 2
K
t=1 et
SIC = T ( T )
T
98 / 247
Regression Statistics: Akaike info criterion 1.423
PT 2
2K
t=1 et
AIC = e ( T )
T
AIC = 2lnL + 2K
99 / 247
Predictive Perspectives
100 / 247
Predictive Perspectives
Estimate out-of-sample forecast accuracy (which is what we
really care about) on the basis of in-sample forecast accuracy. (We
want to select a forecasting model that will perform well for
out-of-sample forecasting, quite apart from its in-sample fit.)
We proceed by inflating the in-sample mean-squared error
(MSE ), in various attempts to offset the deflation from regression
fitting, to obtain a good estimate of out-of-sample MSE .
PT
e2
MSE = t=1 t
T
2 T
s = MSE
T K
2K
AIC = e ( T ) MSE
K
SIC = T ( T ) MSE
The AIC and SIC penalties have certain optimality properties.
101 / 247
Non-Normality and Outliers
102 / 247
What Well Do
103 / 247
Recall Sample Mean Under iid With Normality
y is MVUE, and
2
y N , ,
T
104 / 247
Recall Sample Mean Under iid Without Normality
y is BLUE, and
a
2
y N , ,
T
105 / 247
OLS Under Ideal Conditions With Normality
LS is MVUE, and
LS N , 2 (X 0 X )1 ,
106 / 247
OLS Under Ideal Conditions Without Normality
LS is BLUE, and
a
LS N , 2 (X 0 X )1 ,
107 / 247
Detecting Non-Normality
(In Data or in Residuals)
Many more
108 / 247
Recall Our OLS Wage Regression
109 / 247
OLS Residual Histogram and Statistics
110 / 247
QQ Plots
I To the extent that the QQ plot does not match the 45 degree
line, the nature of the divergence can be very informative, as
for example in indicating fat tails
111 / 247
OLS Wage Regression Residual QQ Plot
112 / 247
Detecting Outliers and Influential Observations:
OLS Residual Plot
113 / 247
Detecting Outliers and Influential Observations:
Leave-One-Out Plot
Consider:
(t) , t = 1, ...T
Leave-one-out plot
114 / 247
Wage Regression
115 / 247
Detecting Outliers and Influential Observations:
Leverage Plot
Leverage plot
116 / 247
(t)
et and ht are two key Pieces of
Other things equal, the larger is et , the larger is (t)
Other things equal, the larger is ht , the larger is (t)
117 / 247
Dealing with Outliers:
Least Absolute Deviations (LAD), Again!
The LAD estimator, LAD , solves:
T
X
min |t |
t=1
Not as simple as OLS, but still simple
LAD regression is quantile regression with d = .5
Recall that OLS fits the conditional mean function:
mean(y |X ) = x
The two are equal under symmetry as with FIC, but not under
asymmetry, in which case the median is a better measure of central
tendency 118 / 247
LAD Wage Regression Estimation
119 / 247
Digging into Prediction (Much) More Deeply (Again)
yt = xt0 + t , t = 1, ..., T
t iid D(0, 2 )
120 / 247
Recall Point Prediction
Assume for the moment that we know the model parameters. That
is, assume that we know and all parameters governing D. Note
that the mean and variance are in general insufficient to
characterize a non-Gaussian D.
E (yi |xi = x ) = x 0 .
121 / 247
Recall Analytic Density Prediction (And Hence Also
Interval Prediction) for D Gaussian
122 / 247
Recall Simulation Algorithm for Density Prediction for D
Gaussian
123 / 247
Recall Making the Forecasts Feasible
124 / 247
Density Prediction for D Parametric Non-Gaussian
125 / 247
Density Prediction for D Non-Parametric
Now assume that we know nothing about distribution D, except
that it has mean 0. In addition, now that we have introduced
feasible forecasts, we will stay in that world.
1. Take R draws from the regression residual density (which is an
approximation to the disturbance density) by assigning
probability 1/N to each regression residual and sampling with
replacement.
2. Add x 0 to each draw.
3. Form a density forecast by fitting a density to the output from
step 2.
4. Form a 95% interval forecast by sorting the output from step
2, and taking the left and right interval endpoints as the the
.025% and .975% values, respectively.
As R and N , the algorithmic results become arbitrarily
accurate.
126 / 247
Density Forecasts for D Nonparametric and Acknowledging
Parameter Estimation Uncertainty
127 / 247
Algorithm for Density Forecasts for D Nonparametric and
Acknowledging Parameter Estimation Uncertainty
1. Take R approximate disturbance draws by assigning
probability 1/N to each regression residual and sampling with
replacement.
2. Take R draws from the large-N sampling density of , namely
OLS N(, 2 (X 0 X )1 ),
as approximated by N(, 2 (X 0 X )1 ).
3. To each disturbance draw from 1 add the corresponding x 0
draw from 2.
4. Form a density forecast by fitting a density to the output from
step 3.
5. Form a 95% interval forecast by sorting the output from step
3, and taking the left and right interval endpoints as the the
.025% and .975% values, respectively.
As R and N , we get precisely correct results.
128 / 247
Indicator Variables in Cross Sections:
Group Effects
129 / 247
Dummy Variables for Group Effects
Intercept dummies
130 / 247
Histograms for Wage Covariates
131 / 247
Recall Basic Wage Regression on Education and
Experience
132 / 247
Basic Wage Regression Results
133 / 247
Basic Wage Regression Residual Scatter
134 / 247
Controlling for Sex, Race, and Union Status
in the Wage Regression
Now:
135 / 247
Wage Regression on Education, Experience, and Group
Dummies
136 / 247
Residual Scatter from Wage Regression on
Education, Experience, and Group Dummies
137 / 247
Important Issues
138 / 247
Nonlinearity
139 / 247
Anscombes Quartet
140 / 247
Anscombes Quartet: Regressions
141 / 247
Anscombes Quartet: Graphics
142 / 247
Parametric and Nonparametric Nonlinearity...
143 / 247
Log-Log Regression
lnyt = 1 + 2 lnxt + t
yt = ALt Kt exp(t )
lnyt = xt +
(0 < r < 1)
Under the remaining FIC (that is, dropping only linearity), NLS
has a sampling distribution similar to that of LS under the FIC
146 / 247
Taylor Series Expansions
yt = g (xt , t )
yt = f (xt ) + t
147 / 247
Taylor Series, Continued
f (xt ) 1 + 2 x,
f (xt ) 1 + 2 xt + 3 xt2 .
148 / 247
A Key Insight
149 / 247
Assessing Non-Linearity
150 / 247
Basic Wage Regression
151 / 247
Quadratic Wage Regression
152 / 247
Dummy Interactions?
153 / 247
Everything
154 / 247
So Drop Dummy Interactions and Tighten the Rest
155 / 247
Heteroskedasticity in Cross-Section Regression
156 / 247
Heteroskedasticity is Another Type of Violation of the IC
(This time its non-constant disturbance variances.)
Consider: N(0, )
12 0 . . . 0
0 2 . . . 0
2
= .
. .. . . .
. . . ..
0 0 . . . N 2
157 / 247
Causes and Consequences of Heteroskedasticity
Causes:
Can arise for many reasons
Engel curve (e.g., food expenditure vs. income) is classic example
Consequences:
OLS estimation remains largely OK.
Parameter estimates consistent but inefficient.
OLS inference destroyed. Standard errors biased and inconsistent.
Hence t statistics do not have the t distribution in finite samples
and do not have the N(0, 1) distribution asymptotically.
e.g., in EViews,
instead of ls y,c,x, use ls(cov=white) y,c,x
159 / 247
Wage regression with White Standard Errors
160 / 247
Detecting Heteroskedasticity
161 / 247
Graphical Diagnostics
162 / 247
Recall Our Final Wage Regression
163 / 247
Squared Residual vs. EDUC
164 / 247
The Breusch-Godfrey-Pagan Test (BGP)
BGP test:
165 / 247
BPG Test
166 / 247
Whites Test
167 / 247
Whites Test
168 / 247
Simulation Algorithm for Density Prediction
D Gaussian, Heteroskedastic Disturbances
169 / 247
Spatial Correlation in Cross-Section Regression
170 / 247
Spatial Correlation is Another Type of Violation of the IC
(This time its non-zero disturbance correlations.)
Consider: N(0, )
12
12 . . . 1T
21 22 . . . 2T
= .
.. .. ..
.. . . .
T 1 T 2 . . . T2
171 / 247
Time Series
172 / 247
Misspecification
173 / 247
Non-Normality and Outliers
174 / 247
Indicator Variables in Time Series:
Trend and Seasonality
175 / 247
Liquor Sales
176 / 247
Log Liquor Sales
177 / 247
Linear Deterministic Trend
Trendt = 1 + 2 TIMEt
where TIMEt = t
178 / 247
Various Linear Trends
179 / 247
Linear Trend Estimation
180 / 247
Residual Plot
181 / 247
Deterministic Seasonality
s
X
Seasonal t = i SEASit (s seasons per year)
i=1
1 if observation t falls in season i
where SEASit =
0 otherwise
182 / 247
Linear Trend with Seasonal Dummies
183 / 247
Residual Plot
184 / 247
Seasonal Pattern
185 / 247
Nonlinearity in Time Series
186 / 247
Non-Linear Trend: Exponential (Log-Linear)
Trendt = 1 e 2 TIMEt
187 / 247
Figure: Various Exponential Trends
188 / 247
Non-Linear Trend: Quadratic
189 / 247
Figure: Various Quadratic Trends
190 / 247
Recall Log-Linear Liquor Sales Trend Estimation
191 / 247
Residual Plot
192 / 247
Log-Quadratic Liquor Sales Trend Estimation
193 / 247
Residual Plot
194 / 247
Log-Quadratic Liquor Sales Trend Estimation
with Seasonal Dummies
196 / 247
Serial Correlation in Time-Series Regression
197 / 247
Serially Correlated Regression Disturbances
Leading example
(AR(1) disturbance serial correlation):
yt = xt0 + t
t = t1 + vt , || < 1
vt iid N(0, 2 )
198 / 247
Serial Correlation Implies 6= 2 I
Recall heteroskedasticity:
diagonal but with different diagonal elements
(0) . . . (T 1)
(1)
(1) . . . (T 2)
(0)
=
.. . ..
. ..
. .. .
(T 1) (T 2) . . . (0)
where:
( ) = cov (t , t ), = 0, 1, 2, ...
Autocovariances: ( ), = 1, 2, ...
Autocorrelations: ( ) = ( )/ (0), = 1, 2, ...
199 / 247
Why is Negected Serial Correlation a Problem for
Prediction?
The IC involve = 2 I , and serial correlation implies 6= 2 I , so
we get inconsistent s.e.s, just as with heteroskedasticity. But that
was basically inconsequential for point forecasts.
Put differently:
Serial correlation in forecast errors means that you can forecast
your forecast errors! So something is wrong and can be improved...
200 / 247
What if You Dont Care About Neglected Serial
Correlation?
Hard to imagine
e.g., in EViews,
instead of ls y,c,x, use ls(cov=hac) y,c,x
201 / 247
Trend + Seasonal Liquor Sales Regression with HAC
Standard Errors
202 / 247
Detecting Serial Correlation
I Formal tests
I Durbin-Watson
I Breusch-Godfrey
203 / 247
Recall Our Log-Quadratic Liquor Sales Model
204 / 247
Formal Tests: Durbin-Watson (0.59!)
yt = xt0 + t
t = t1 + vt
vt iid N(0, 2 )
We want to test H0 : = 0 against H1 : 6= 0
Then form:
PT
(et et1 )2
DW = t=2PT
2
t=1 et
205 / 247
Understanding the Durbin-Watson Statistic
PT 2 1 PT 2
t=2 (et et1 ) T t=2 (et et1 )
DW = PT = 1 PT
2 2
t=1 et T t=1 et
1 PT 2 1 PT 2 1 PT
T t=2 et + T t=2 et1 2 T t=2 et et1
= 1 PT 2
T t=1 et
Hence as T :
2 + 2 2cov (t , t1 )
DW = 1+12corr (t , t1 ) = 2(1corr (t , t1 ))
2
= DW [0, 4], DW 2 as 0, and DW 0 as 1
206 / 247
Formal Tests: Breusch-Godfrey
yt = xt0 + t
t = 1 t1 + ... + p tp + vt
We want to test H0 : (1 , ..., p ) = 0 against H1 : (1 , ..., p ) 6= 0
207 / 247
BG for AR(1) Disturbances
(TR 2 = 168.5, p = 0.0000)
208 / 247
BG for AR(4) Disturbances
(TR 2 = 216.7, p = 0.0000)
209 / 247
BG for AR(8) Disturbances
(TR 2 = 219.0, p = 0.0000)
210 / 247
Residual Plot
211 / 247
Residual Scatterplot (et vs. et1 )
212 / 247
Residual Autocorrelations
213 / 247
Fixing the Serially Correlation Problem:
Including Lags of y as Regressors
214 / 247
Trend + Seasonal Model
with Four Lags of y
215 / 247
Trend + Seasonal Model
with Four Lags of y
Residual Plot
216 / 247
Residual Scatterplot
217 / 247
Residual Autocorrelations
218 / 247
Forecasting and the Forecasting the Right-Hand-Side
Variables Problem
219 / 247
What About Autoregressions?
e.g., AR(1)
yt = yt1 + t
Hence:
yt+h,t = yt+h1,t
221 / 247
Structural Change
Sharp Breakpoint Exogenously Known
For simplicity of exposition, consider a bivariate regression:
11 + 21 xt + t , t = 1, ..., T
yt =
12 + 22 xt + t , t = T + 1, ..., T
Let
0, t = 1, ..., T
Dt =
Dt = 1, t = T + 1, ...T
Then we can write the model as:
yt = (11 + (12 11 )Dt ) + (21 + (22 21 )Dt )xt + t
We run:
yt c, Dt , xt , Dt xt
Use regression to test for structural change (F test)
Use regression to accommodate structural change if present.
222 / 247
Structural Change
Sharp Breakpoint, Exogenously Known, Continued
Distributed F under the no-break null (and the rest of the IC)
223 / 247
Structural Change
Sharp Breakpoint, Endogenously Identified
224 / 247
Rolling-Window Regression
for Generic Structural Change Assessment
tw :t
for t = w + 1, ..., T
w is window width
225 / 247
Expanding-Window (Recursive) Regression
for Generic Structural Change Assessment
Model:
K
X
yt = k xkt + t
k=1
t iidN(0, 2 ),
t = 1, ..., T .
et+1,t N(0, 2 rt )
227 / 247
Standardized Recursive Residuals and CUSUM
et+1,t
wt+1,t ,
rt
t = K , ..., T 1.
Then
t
X
CUSUMt wt+1,t , t = K , ..., T 1
t=K
228 / 247
Recursive Analysis Constant Parameter Model
229 / 247
Recursive Analysis Breaking Parameter Model
230 / 247
Heteroskedasticity in Time Series
231 / 247
Varieties of Random (White) Noise
iid
Independent (strong) white noise: t (0, 2 )
iid
Gaussian white noise: t N(0, 2 )
232 / 247
Linear Models (e.g., AR(1))
rt = rt1 + t
t iid(0, 2 ), || < 1
233 / 247
ARCH(1) Process
rt |t1 N(0, ht )
2
ht = + rt1
E (rt ) = 0
E (rt 2 ) =
(1 )
E (rt |t1 ) = 0
E ([rt E (rt |t1 )]2 |t1 ) = + rt1
2
234 / 247
GARCH(1,1) Process (Generalized ARCH)
rt | t1 N(0, ht )
2
ht = + rt1 + ht1
E (rt ) = 0
E (rt 2 ) =
(1 )
E (rt |t1 ) = 0
E ([rt E (rt | t1 )]2 | t1 ) = + rt1
2
+ ht1
t2 = t1
2
+ (1 )rt2
X
= t2 = (1 ) j rtj
2
236 / 247
Tractable Maximum-Likelihood Estimation
where = (, , )0
1 rt2
1 1/2
f (rt |t1 ; ) = ht () exp ,
2 2 ht ()
so
1X 1 X rt2
ln L = const ln ht ()
2 t 2 t ht ()
237 / 247
Variations on the GARCH Theme
238 / 247
Regression with GARCH Disturbances
yt = xt0 + t
t |t1 N(0, ht )
239 / 247
Fat-Tailed Conditional Densities: t-GARCH
So take:
p td
r t = ht
std(td )
240 / 247
Asymmetric Response and the Leverage Effect: T-GARCH
2 + h
Standard GARCH: ht = + rt1 t1
2 + r 2 D
T-GARCH: ht = + rt1 t1 t1 + ht1
1 if rt < 0
Dt =
0 otherwise
241 / 247
A Useful Specification Diagnostic
t |t1 N(0, ht )
p
t = ht vt , vt iidN(0, 1)
t = vt , vt iidN(0, 1)
ht
Infeasible: examine vt = t / ht . iid? Gaussian?
p
Feasible: examine vt = t / ht . iid? Gaussian?
242 / 247
Conditional Mean Estimation
243 / 247
Conditional Variance Estimation
244 / 247
Autocorrelations of Squared Standardized Residuals
245 / 247
Distribution of Standardized Residuals
246 / 247
Time Series of Estimated Conditional Standard Deviations
247 / 247