0 vues

Transféré par Anonymous tsTtieMHD

Classical Least Squares Theory - Lecture Notes

- System Methods with R
- et_Ch3
- Research Proposal for MBA
- highbrow_films_gather_dust-time_inconsistent_preferences_and_online_dvd_rentals.pdf
- Refcard Rms
- art%3A10.1023%2FB%3AEMPI.0000005240.01953.e0
- 6 Basic Econometrics
- Linear Regression Model
- Multiple Regression Analysis
- Lec6-2019
- Sushmita Chatterjee
- Estimating Project Development Effort Using Clustered Regression Approach
- Chapter 3
- Data Envelopment Analysis as Nonparametric Least Squares Regression
- Analysis 1
- Parities and Spread Trading in Gold and Silver Marketss
- Predicting Quarterly Aggregates
- Capital Accumulation and Growth - A New Look at the Empirical Evidence
- Corre-Regre
- WHO'S IN, WHO'S OUT? EVALUATING THE ECONOMIC AND SOCIAL IMPLICATIONS OF PARTICIPATION IN CLEAN VEHICLE REBATE PROGRAMS

Vous êtes sur la page 1sur 109

CHUNG-MING KUAN

National Taiwan University

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 1 / 100

Lecture Outline

1 The Method of Ordinary Least Squares (OLS)

Simple Linear Regression

Multiple Linear Regression

Geometric Interpretations

Measures of Goodness of Fit

Example: Analysis of Suicide Rate

2 Statistical Properties of the OLS Estimator

Classical Conditions

Without the Normality Condition

With the Normality Condition

3 Hypothesis Testing

Tests for Linear Hypotheses

Power of the Tests

Alternative Interpretation of the F Test

Confidence Regions

Example: Analysis of Suicide Rate

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 2 / 100

Lecture Outline (cont’d)

4 Multicollinearity

Near Multicollinearity

Regression with Dummy Variables

Example: Analysis of Suicide Rate

The GLS Estimator

Stochastic Properties of the GLS Estimator

The Feasible GLS Estimator

Heteroskedasticity

Serial Correlation

Application: Linear Probability Model

Application: Seemingly Unrelated Regressions

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 3 / 100

Simple Linear Regression

another variable x that can characterize the systematic behavior of y .

x: Explanatory variable or regressor

Specifying a linear function of x: α + βx with unknown parameters α

and β

The non-systematic part is the error: y − (α + βx)

Together we write:

y= α + βx + e(α, β) .

| {z } | {z }

linear function error

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 4 / 100

For the specification α + βx, the objective is to find the “best” fit of the

data (yt , xt ), t = 1, . . . , T .

1 Minimizing a least-squares (LS) criterion function wrt α and β:

T

1 X

QT (α, β) := (yt − α − βxt )2 .

T

t=1

T

1 X

|yt − α − βxt |.

T

t=1

1 X X

θ |yt − α − βxt | + (1 − θ) |yt − α − βxt | ,

T

t:yt >α+βxt t:yt <α+βxt

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 5 / 100

For the specification α + βx, the objective is to find the “best” fit of the

data (yt , xt ), t = 1, . . . , T .

1 Minimizing a least-squares (LS) criterion function wrt α and β:

T

1 X

QT (α, β) := (yt − α − βxt )2 .

T

t=1

T

1 X

|yt − α − βxt |.

T

t=1

1 X X

θ |yt − α − βxt | + (1 − θ) |yt − α − βxt | ,

T

t:yt >α+βxt t:yt <α+βxt

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 5 / 100

For the specification α + βx, the objective is to find the “best” fit of the

data (yt , xt ), t = 1, . . . , T .

1 Minimizing a least-squares (LS) criterion function wrt α and β:

T

1 X

QT (α, β) := (yt − α − βxt )2 .

T

t=1

T

1 X

|yt − α − βxt |.

T

t=1

1 X X

θ |yt − α − βxt | + (1 − θ) |yt − α − βxt | ,

T

t:yt >α+βxt t:yt <α+βxt

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 5 / 100

The OLS Estimators

T

∂QT (α, β) 2 X

=− (yt − α − βxt ) = 0,

∂α T

t=1

T

∂QT (α, β) 2 X

=− (yt − α − βxt )xt = 0.

∂β T

t=1

estimators:

PT

(yt − ȳ )(xt − x̄)

β̂T = t=1 PT ,

2

t=1 (xt − x̄)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 6 / 100

The estimated regression line is ŷ = α̂T + β̂T x, which is the linear

function evaluated at α̂T and β̂T , and ê = y − ŷ is the error

evaluated at α̂T and β̂T and also known as residual.

The t-th fitted value of the regression line is ŷt = α̂T + β̂T xt .

The t-th residual is êt = yt − ŷt = et (α̂T , β̂T ).

No other linear functions of the form a + bx can provide a better fit of

the data in terms of sum of squared errors.

unit of x, whereas α̂T is the predicted y without (the information of)

x.

Note that the OLS method requires no assumption on the data,

except that xt can not be a constant.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 7 / 100

Algebraic Properties

T T

1 X 1 X

(yt − α − βxt ) = 0, (yt − α − βxt )xt = 0,

T T

t=1 t=1

PT

t=1 êt = 0.

PT

t=1 êt xt = 0.

PT PT ¯

t=1 yt = t=1 ŷt so that ȳ = ŷ .

ȳ = α̂T + β̂T x̄; that is, the estimated regression line must pass

through the point (x̄, ȳ ).

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 8 / 100

Example: Analysis of Suicide Rate

Suppose we want to know how the suicide rate (s) in Taiwan can be

explained by unemployment rate (u), GDP growth rate (g ), or time

(t). The suicide rate is 1/100000.

Data (1981–2013): s̄ = 12.05 with s.d. 3.91; ḡ = 5.64 with s.d. 3.16;

ū = 3.09 with s.d. 1.33.

Estimation results:

ŝt = 15.56 − 0.61 gt−1 , R̄ 2 = 0.21;

ŝt = 4.40 + 2.48 ut , R̄ 2 = 0.70;

ŝt = 4.84 + 2.40 ut−1 , R̄ 2 = 0.68;

ŝt = 7.40 + 0.27 t, R̄ 2 = 0.44.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 9 / 100

10 / 100

2013

(b) Suicide and unemploy. rates

2012

2011

2010

2009

October 18, 2014

2008

2007

2006

2005

2004

2003

Unemplyment Rate

2002

2001

2000

1999

1998

1997

1996

Suicide Rate

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

1981

25

20

15

10

2013

2012

2011

2010

Suicide Rates 1981–2013

2009

2008

2007

2006

2005

2004

2003

Real GDP Growth Rate

2002

2001

2000

1999

1998

1997

1996

1995

Suicide Rate

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

1981

-5

25

20

15

10

0

Multiple Linear Regression

y = β1 x1 + · · · + βk xk + e(β1 , . . . , βk ).

y = Xβ + e(β), (1)

where β = (β1 β2 · · · βk )0 ,

y1 x11 x12 · · · x1k e1 (β)

y2 x21 x22 · · · x2k e2 (β)

y= .. , X= .. .. .. .. , e(β) = .. .

. . . . . .

yT xT 1 xT 2 · · · xTk eT (β)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 11 / 100

Least-squares criterion function:

1 1

QT (β) := e(β)0 e(β) = (y − Xβ)0 (y − Xβ). (2)

T T

The FOCs of minimizing QT (β) are −2X0 (y − Xβ)/T = 0, leading

to the normal equations:

X0 Xβ = X0 y.

Any column of X is not a linear combination of other columns.

Intuition: X does not contain redundant information.

When When X is not of full column rank, we say there exists exact

multicollinearity among regressors.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 12 / 100

Given [ID-1], X0 X is positive definite and hence invertible. The unique

solution to the normal equations is known as the OLS estimator of β:

The result below requires only the identification requirement and does

not depend on the statistical properties of y and X.

Theorem 3.1

Given specification (1), suppose [ID-1] holds. Then, the OLS estimator

β̂ T = (X0 X)−1 X0 y uniquely minimizes the criterion function (2).

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 13 / 100

The magnitude of β̂ T is affected by the measurement units of the

dependent and explanatory variables.

A larger coefficient does not imply that the associated regressor is more

important.

The so-called “beta coefficients” (see homework) do not depend on the

measurement units, and hence their magnitudes are comparable.

Given β̂ T , the vector of the OLS fitted values is ŷ = Xβ̂ T , and the

vector of the OLS residuals is ê = y − ŷ = e(β̂ T ).

Plugging β̂ T into the FOCs: X0 (y − Xβ) = 0, we have:

X0 ê = 0.

PT

When X contains a vector of ones, t=1 êt = 0.

0

ŷ0 ê = β̂ T X0 ê = 0.

These are all algebraic results under the OLS method.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 14 / 100

Geometric Interpretations

projects vectors onto span(X), and IT − P is the orthogonal projection

matrix that projects vectors onto span(X)⊥ , the orthogonal complement of

span(X). Thus, PX = X and (IT − P)X = 0.

orthogonal projection of y onto span(X).

The residual vector, ê = y − ŷ = (IT − P)y, is the orthogonal

projection of y onto span(X)⊥ .

ê is orthogonal to X, i.e., X0 ê = 0, and it is also orthogonal to ŷ

because ŷ is in span(X), i.e., ŷ0 ê = 0.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 15 / 100

y

x2

ê = (I − P )y

x2 β̂ 2 P y = x1 β̂ 1 + x2 β̂ 2

x1

x1 β̂ 1

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 16 / 100

Theorem 3.3 (Frisch-Waugh-Lovell)

Given y = X1 β 1 + X2 β 2 + e, the OLS estimators of β 1 and β 2 are

where (I − P2 )y and (I − P2 )X1 are the residual vectors of y on X2

and X1 on X2 , respectively. That is, β̂ 1,T characterizes the marginal

effect of X1 on y, after the effect of X2 is purged away.

Similarly, regressing (I − P1 )y on (I − P1 )X2 yields β̂ 2,T . Thus, That

is, β̂ 2,T characterizes the marginal effect of Xx on y, after the effect

of X1 is purged away.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 17 / 100

Proof: Writing y = X1 β̂ 1,T + X2 β̂ 2,T + (I − P)y, where P = X(X0 X)−1 X0

with X = [X1 X2 ], we have

X01 (I − P2 )y

(I − P2 )(I − P) = I − P, and

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 18 / 100

Some Implications of the FWL Theorem

unless X1 and X2 are orthogonal to each other.

Regressing y on X1 and X2 enables us to assess the marginal effect of

X1 on y while “controlling” the effect of X2 and also the marginal

effect of X2 on y while “controlling” the effect of X1 .

When one regresses y on X1 only, the estimated marginal effect of X1

is affected by other relevant variables not included in the regression.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 19 / 100

Observe that (I − P1 )y = (I − P1 )X2 β̂ 2,T + (I − P1 )(I − P)y.

(I − P1 )y on (I − P1 )X2 is identical to the residual vector of

regressing y on X = [X1 X2 ]:

span(X1 ) (i.e., P1 y) is equivalent to iterated projections of y on

span(X) and then on span(X1 ) (i.e., P1 Py). Hence,

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 20 / 100

y

x2 ê = (I − P)y

Py

(I − P1 )y

(P − P1 )y

x1

P1 y

Figure: An illustration of the Frisch-Waugh-Lovell Theorem.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 21 / 100

Measures of Goodness of Fit

(total sum of squares), ŷ0 ŷ is RSS (regression sum of squares), and

ê0 ê is ESS (error sum of squares).

The non-centered coefficient of determination (or non-centered R 2 ),

RSS ESS

R2 = =1− , (4)

TSS TSS

measures the proportion of the total variation of yt that can be

explained by the model.

It is invariant wrt measurement units of the dependent variable but not

invariant wrt constant addition.

It is a relative measure such that 0 ≤ R 2 ≤ 1.

It is nondecreasing in the number of regressors. (Why?)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 22 / 100

Centered R 2

When the specification contains a constant term,

T

X T

X T

X

(yt − ȳ )2 = (ŷt − ŷ¯ )2 + êt2 .

|t=1 {z } |t=1 {z } |t=1

{z }

centered TSS centered RSS ESS

The centered coefficient of determination (or centered R 2 ),

PT 2

2 t=1 (ŷt − ȳ ) Centered RSS ESS

R = PT = =1− ,

t=1 (yt − ȳ )

2 Centered TSS Centered TSS

explained by the model, excluding the effect of the constant term.

It is invariant wrt constant addition.

0 ≤ R 2 ≤ 1, and it is non-decreasing in the number of regressors.

It may be negative when the model does not contain a constant term.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 23 / 100

Centered R 2 : Alternative Interpretation

When the specification contains a constant term,

T

X T

X T

X

(yt − ȳ )(ŷt − ȳ ) = (ŷt − ȳ + êt )(ŷt − ȳ ) = (ŷt − ȳ )2 ,

t=1 t=1 t=1

PT Pt

because t=1 ŷt êt = t=1 êt = 0.

Centered R 2 can also be expressed as

PT PT

(ŷt − ȳ )2 (y − ȳ )(ŷt − ȳ )]2

[

R =2

Pt=1

T

= PT t=1 t ,

− ȳ )2 [ t=1 (yt − ȳ )2 ][ T 2

P

t=1 (yt t=1 (ŷt − ȳ ) ]

also known as the squared multiple correlation coefficient.

Models for different dep. variables are not comparable in terms of R 2 .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 24 / 100

Adjusted R 2

ê0 ê/(T − k)

R̄ 2 = 1 − .

(y0 y − T ȳ 2 )/(T − 1)

T −1 k −1

R̄ 2 = 1 − (1 − R 2 ) = R 2 − (1 − R 2 ),

T −k T −k

where the penalty term depends on the trade-off between model

complexity and model explanatory ability.

R̄ 2 may be negative and need not be non-decreasing in k.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 25 / 100

Example: Analysis of Suicide Rate

(u), GDP growth rate (g ), and time (t) during 1981–2010?

Estimation results with gt and ut :

ŝt = 4.40 + 2.48 ut , R̄ 2 = 0.70;

ŝt = 4.20 + 2.50 ut + 0.02 gt , R̄ 2 = 0.69.

ŝt = 4.84 + 2.40 ut−1 , R̄ 2 = 0.68;

ŝt = 5.56 + 2.31 ut−1 − 0.08 gt−1 , R̄ 2 = 0.67.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 26 / 100

Estimation results with t but without g :

ŝt = 4.41 + 2.41 ut + 0.01 t, R̄ 2 = 0.69;

ŝt = 4.84 + 2.40 ut−1 , R̄ 2 = 0.68;

ŝt = 4.84 + 2.35 ut−1 + 0.01 t, R̄ 2 = 0.67.

ŝt = 4.16 + 2.43 ut + 0.03 gt + 0.01 t, R̄ 2 = 0.68;

ŝt = 5.56 + 2.31 ut−1 − 0.08 gt−1 , R̄ 2 = 0.67;

ŝt = 5.55 + 2.29 ut−1 − 0.08 gt−1 + 0.004 t, R̄ 2 = 0.66.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 27 / 100

Estimation results with t and t 2 :

ŝt = 11.60 − 0.45 t + 0.02 t 2 , R̄ 2 = 0.63;

ŝt = 8.05 + 1.93 ut − 0.46 t + 0.02 t 2 , R̄ 2 = 0.78;

ŝt = 12.27 − 0.09 gt − 0.45 t + 0.02 t 2 , R̄ 2 = 0.62;

ŝt = 7.68 + 1.96 ut + 0.04 gt − 0.46 t + 0.02 t 2 , R̄ 2 = 0.77;

ŝt = 8.37 + 1.80 ut−1 − 0.43 t + 0.01 t 2 , R̄ 2 = 0.75;

ŝt = 13.02 − 0.18 gt−1 − 0.45 t + 0.02 t 2 , R̄ 2 = 0.64;

ŝt = 9.03 + 1.74 ut−1 − 0.07 gt−1 − 0.43 t + 0.01 t 2 , R̄ 2 = 0.74.

provide good fit of data and reasonable interpretation.

Q: Is there any other way to determine if a specification is “good”?

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 28 / 100

Classical Conditions

[A1] X is non-stochastic.

[A2] y is a random vector such that

(i) IE(y) = Xβ o for some β o ;

(ii) var(y) = σo2 IT for some σo2 > 0.

[A3] y is a random vector s.t. y ∼ N (Xβ o , σo2 IT ) for some β o and σo2 > 0.

The specification (1) with [A1] and [A2] is known as the classical

linear model, whereas (1) with [A1] and [A3] is the classical normal

linear model.

When var(y) = σo2 IT , the elements of y are homoskedastic and

(serially) uncorrelated.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 29 / 100

Without Normality

T

1 X 2

σ̂T2 = êt .

T −k

t=1

Theorem 3.4

Consider the linear specification (1).

(a) Given [A1] and [A2](i), β̂ T is unbiased for β o .

(b) Given [A1] and [A2], σ̂T2 is unbiased for σo2 .

(c) Given [A1] and [A2], var(β̂ T ) = σo2 (X0 X)−1 .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 30 / 100

Proof: By [A1], IE(β̂ T ) = IE[(X0 X)−1 X0 y] = (X0 X)−1 X0 IE(y). [A2](i)

gives IE(y) = Xβ o , so that

where the 4-th equality follows from [A2](ii) that var(y) = σo2 IT .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 31 / 100

Proof (cont’d): As trace(IT − P) = rank(IT − P) = T − k, we have

IE(ê0 ê) = σo2 (T − k) and

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 32 / 100

Theorem 3.4 establishes unbiasedness of the OLS estimators β̂ T and

σ̂T2 but does not address the issue of efficiency.

By Theorem 3.4(c), the elements of β̂ T can be more precisely

estimated (i.e., with a smaller variance) when X has larger variation.

To see this, consider the simple linear regression: y = α + βx + e, it

can be verified that

1

var(β̂T ) = σo2 PT .

t=1 (xt − x̄)2

PT

Thus, the larger the (squared) variation of xt (i.e., t=1 (xt − x̄)2 ),

the smaller is the variance of β̂T .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 33 / 100

The result below establishes efficiency of β̂ T among all unbiased

estimators of β o that are linear in y.

Given linear specification (1), suppose that [A1] and [A2] hold. Then the

OLS estimator β̂ T is the best linear unbiased estimator (BLUE) for β o .

non-stochastic matrix, say, A = (X0 X)−1 X0 + C. Then, β̌ T = β̂ T + Cy,

such that

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 34 / 100

Proof (cont’d): The condition CX = 0 implies cov(β̂ T , Cy) = 0. Thus,

This shows that var(β̌ T ) − var(β̂ T ) is a p.s.d. matrix σo2 CC0 , so that β̂ T

is more efficient than any linear unbiased estimator β̌ T .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 35 / 100

Example: IE(y) = X1 b1 and var(y) = σo2 IT . Two specification:

y = X1 β 1 + e.

y = Xβ + e = X1 β 1 + X2 β 2 + e.

0 0

with the OLS estimator β̂ T = (β̂ 1,T β̂ 2,T )0 . Clearly, b̂1,T is the BLUE of

b1 with var(b̂1,T ) = σo2 (X01 X1 )−1 . By the Frisch-Waugh-Lovell Theorem,

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 36 / 100

Example (cont’d):

is p.s.d. This shows that b̂1,T is more efficient than β̂ 1,T , as it ought to

be.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 37 / 100

With Normality

T T 1

log L(β, σ 2 ) = − log(2π) − log σ 2 − 2 (y − Xβ)0 (y − Xβ).

2 2 2σ

The score vector is

1 0

σ2

X (y − Xβ)

s(β, σ 2 ) = .

T

− 2σ 2 +

1

2σ 4

(y − Xβ)0 (y − Xβ)

estimators (MLEs). Clearly, the MLE of β is the OLS estimator, and

the MLE of σ 2 is

(y − Xβ̂ T )0 (y − Xβ̂ T ) ê0 ê

σ̃T2 = = 6= σ̂T2 .

T T

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 38 / 100

With the normality condition on y, a lot more can be said about the OLS

estimators.

Theorem 3.7

Given the linear specification (1), suppose that [A1] and [A3] hold.

(a) β̂ T ∼ N β o , σo2 (X0 X)−1 .

(c) σ̂T2 has mean σo2 and variance 2σo4 /(T − k).

y ∼ N (Xβ o , σo2 IT ) and hence also a normal random vector. As for (b),

writing ê = (IT − P)(y − Xβ o ), we have

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 39 / 100

Proof (cont’d): Let C orthogonally diagonalizes IT − P such that

C0 (IT − P)C = Λ. Since rank(IT − P) = T − k, Λ contains T − k

eigenvalues equal to one and k eigenvalues equal to zero. Then,

" #

IT −k 0

y∗0 (IT − P)y∗ = y∗0 C[C0 (IT − P)C]C0 y∗ = η 0 η.

0 0

random variables. It follows that

T

X −k

∗0 ∗

y (IT − P)y = ηi2 ∼ χ2 (T − k),

i=1

proving (b). (c) is a direct consequence of (b) and the facts that

χ2 (T − k) has mean T − k and variance 2(T − k).

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 40 / 100

Theorem 3.8

Given the linear specification (1), suppose that [A1] and [A3] hold. Then

the OLS estimators β̂ T and σ̂T2 are the best unbiased estimators (BUE)

for β o and σo2 , respectively.

1 0 1 0

− σ2

X X − σ4

X (y − Xβ)

H(β, σ 2 ) = .

− σ4 (y − Xβ) X 2σ4 − σ6 (y − Xβ)0 (y − Xβ)

1 0 T 1

− σ12 X0 X 0

IE[H(β o , σo2 )] = o .

T

0 − 2σ 4

o

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 41 / 100

Proof (cont’d):

By the information matrix equality, − IE[H(β o , σo2 )] is the information

matrix. Then, its inverse,

σo2 (X0 X)−1 0

− IE[H(β o , σo2 )]−1 = ,

2σo4

0 T

is the best unbiased estimator for β o . This conclusion is much

stronger than the Gauss-Markov Theorem.

Although var(σ̂T2 ) = 2σo4 /(T − k) is greater than the lower bound

(lower-right element), it can be shown that σ̂T2 is still the best

unbiased estimator for σo2 ; see Rao (1973, p. 319) for a proof.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 42 / 100

Tests for Linear Hypotheses

and q < k, r is a vector of hypothetical values.

A natural way to construct a test statistic is to compare Rβ̂ T and r ;

we reject the null if their difference is too “large.”

Given [A1] and [A3], Theorem 3.7(a) states:

so that

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 43 / 100

When q = 1, Rβ̂ T and R(X0 X)−1 R0 are scalars. Under the null hypothesis,

Rβ̂ T − r R(β̂ T − β o )

0 0 1/2 = ∼ N (0, 1).

−1

σo [R(X X) R ] σo [R(X0 X)−1 R0 ]1/2

with σ̂T , we obtain the so-called t statistic:

Rβ̂ T − r

τ= .

σ̂T [R(X0 X)−1 R0 ]1/2

Theorem 3.9

Given the linear specification (1), suppose that [A1] and [A3] hold. When

R is 1 × k, τ ∼ t(T − k) under the null hypothesis.

Note: The normality condition [A3] is crucial for this t distribution result.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 44 / 100

Proof: We write the statistic τ as

,s

Rβ̂ T − r (T − k)σ̂T2 /σo2

τ= ,

σo [R(X0 X)−1 R0 ]1/2 T −k

Theorem 3.7(b). The assertion follows when the numerator and

denominator are independent. This is indeed the case, because β̂ T and ê

are jointly normally distributed with

= 0.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 45 / 100

Examples

of M−1 = (X0 X)−1 . Then, the resulting t statistic is:

β̂i,T − c

τ= √ ∼ t(T − k),

σ̂T mii

ratio.

It is straightforward to verify that to test aβi + bβj = c, with a, b, c given

constants, the corresponding t statistic reads:

aβ̂i,T + b β̂j,T − c

τ= p ∼ t(T − k).

σ̂T [a2 mii + b 2 mjj + 2abmij ]

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 46 / 100

When R is a q × k matrix with full row rank q (q > 1), we have under the

null hypothesis: [R(X0 X)−1 R0 ]−1/2 (Rβ̂ T − r)/σo ∼ N (0, Iq ). Hence,

ϕ= .

σ̂T2 q

ϕ= ,

(T − k)σ̂T2 /[σo2 (T − k)]

their corresponding degrees of freedom.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 47 / 100

Theorem 3.10

Given the linear specification (1), suppose that [A1] and [A3] hold. When

R is q × k with full row rank, ϕ ∼ F (q, T − k) under the null hypothesis.

Notes:

1 When q = 1, ϕ ∼ F (1, T − k), and this distribution is the same as

that of τ 2 .

2 Note that t distribution is symmetric about zero. Hence one may

consider one- or two-sided t test. On the other hand, F distribution is

non-negative and asymmetric, it is more typical to consider only

one-sided F test.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 48 / 100

Example: Ho : β1 = b1 and β2 = b2 . The F statistic,

!0 " #−1 !

1 β̂1,T − b1 m11 m12 β̂1,T − b1

ϕ= 2 ,

2σ̂T β̂2,T − b2 m21 m22 β̂2,T − b2

0 −1

β̂2,T m22 m23 · · · m2k β̂2,T

β̂3,T m32 m33 · · · m3k β̂3,T

1

ϕ= .. .. .. .. ,

(k − 1)σ̂T2

.

. .

.

β̂k,T mk2 mk3 · · · mkk β̂k,T

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 49 / 100

Test Power

under the alternative hypothesis: Rβ o = r + δ, with R is a q × k matrix

with rank q < k and δ 6= 0.

Theorem 3.11

Given the linear specification (1), suppose that [A1] and [A3] hold. When

Rβ o = r + δ,

of the numerator of ϕ.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 50 / 100

Proof: When Rβ o = r + δ,

(Rβ̂ T − r)0 [R(X0 X)−1 R0 ]−1 (Rβ̂ T − r)/σo2 ∼ χ2 (q; δ 0 D−1 δ),

is also readily seen that (T − k)σ̂T2 /σo2 is still distributed as χ2 (T − k).

Similar to the argument before, these two terms are independent, so that

ϕ has a non-central F distribution.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 51 / 100

Test power is determined by the non-centrality parameter δ 0 D−1 δ,

where δ signifies the deviation from the null. When Rβ o deviates

farther from the hypothetical value r (i.e., δ is “large”), the

non-centrality parameter δ 0 D−1 δ increases, and so does the power.

Example: The null distribution is F (2, 20), and its critical value at 5%

level is 3.49. Then for F (2, 20; ν1 , 0) with the non-centrality

parameter ν1 = 1, 3, 5, the probabilities that ϕ exceeds 3.49 are

approximately 12.1%, 28.2%, and 44.3%, respectively.

Example: The null distribution is F (5, 60), and its critical value at 5%

level is 2.37. Then for F (5, 60; ν1 , 0) with ν1 = 1, 3, 5, the

probabilities that ϕ exceeds 2.37 are approximately 9.4%, 20.5%, and

33.2%, respectively.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 52 / 100

Alternative Interpretation

1

min (y − Xβ)0 (y − Xβ) + (Rβ − r)0 λ,

β,λ T

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 53 / 100

The sum of squared, constrained OLS residuals are:

where the 2nd term on the RHS is the numerator of the F statistic.

Letting ESSc = ë0 ë and ESSu = ê0 ê we have

ϕ= 2

= ,

qσ̂T ESSu /(T − k)

unconstrained models based on their lack-of-fitness.

(Ru −Rc )/q 2 2

The regression F test is thus ϕ = (1−R 2 which compares model

u )/(T −k)

fitness of the full model and the model with only a constant term.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 54 / 100

The sum of squared, constrained OLS residuals are:

where the 2nd term on the RHS is the numerator of the F statistic.

Letting ESSc = ë0 ë and ESSu = ê0 ê we have

ϕ= 2

= ,

qσ̂T ESSu /(T − k)

unconstrained models based on their lack-of-fitness.

(Ru −Rc )/q 2 2

The regression F test is thus ϕ = (1−R 2 which compares model

u )/(T −k)

fitness of the full model and the model with only a constant term.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 54 / 100

Confidence Regions

IP{ g α ≤ βi,o ≤ g α } = 1 − α,

Letting cα/2 be the critical value of t(T − k) with tail prob. α/2,

√

IP β̂i,T − βi,o / σ̂T mii ≤ cα/2

√ √

= IP β̂i,T − cα/2 σ̂T mii ≤ βi,o ≤ β̂i,T + cα/2 σ̂T mii

= 1 − α.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 55 / 100

The confidence region for a vector of parameters can be constructed

by resorting to F statistic.

For (β1,o = b1 , β2,o = b2 )0 , suppose T − k = 30 and α = 0.05. Then,

F0.05 (2, 30) = 3.32, and

!0 " #−1 !

1 β̂1,T − b1 m11 m12 β̂1,T − b1

IP ≤ 3.32

2σ̂T2 β̂2,T − b2 m21 m22 β̂2,T − b2

Note: It is possible that (β1 , β2 ) is outside the confidence box formed

by individual confidence intervals but inside the joint confidence

ellipse. That is, while a t ratio may indicate statistic significance of a

coefficient, the F test may suggest the opposite based on the

confidence region.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 56 / 100

Example: Analysis of Suicide Rate

(4.57 ) (8.63∗∗ )

∗∗

(4.50∗∗ ) (5.04∗∗ ) (0.18)

(5.09∗∗ ) (8.30∗∗ )

4.84 2.35 0.01 0.67 33.37∗∗

(5.01∗∗ ) ∗∗

(4.74 ) (0.13)

significance of a two-sided test at 1% and 5% levels, respectively.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 57 / 100

Part II: Estimation results with t and g

(2.47∗ ) (7.25∗∗ ) (0.14)

4.16 2.43 0.03 0.01 0.68 23.31∗∗

(2.39∗ ) (4.85∗∗ ) (0.18) (0.21)

(3.25∗∗ ) (6.70∗∗ ) (−0.51)

5.55 2.29 −0.08 0.004 0.66 21.77∗∗

(3.18∗∗ ) (4.40∗∗ ) (−0.49) (0.06)

F tests for the joint significance of the coefficients of g and t: 0.03 (Model

2) and 0.13 (Model 4), which are insignificant even at 10% level.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 58 / 100

Part III: Estimation results with t and t 2

const ut ut−1 gt gt−1 t t2 R̄ 2 /F

(8.82∗∗ ) (−2.50∗ ) (4.16∗∗ ) 28.61∗∗

(6.32∗∗ ) (4.62∗∗ ) (−3.34∗∗ ) (3.73∗∗ ) 39.10∗∗

7.68 1.96 0.04 −0.46 0.02 0.77

(4.42∗∗ ) (4.50∗∗ ) (0.32) (−3.27∗∗ ) (3.68∗∗ ) 28.44∗∗

(6.11∗∗ ) (3.87∗∗ ) (−2.92∗∗ ) (3.27∗∗ ) 32.97∗∗

9.03 1.74 −0.07 −0.43 0.01 0.74

(4.84∗∗ ) (3.60∗∗ ) (−0.53) (−2.90∗∗ ) (3.22∗∗ ) 24.18∗∗

F tests for the joint significance of the coefficients of g and t: 5.45∗ (Model

3) and 4.29∗ (Model 5), which are significant at 5% level.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 59 / 100

Selected estimation results (with more precise estimates):

ŝt = 8.37 + 1.80 ut−1 − 0.430 t + 0.0147 t 2 , R̄ 2 = 0.75.

approx. 410 persons. That is, the suicide rate would increase by 1.8

when there is one percent increase of last year’s unemployment rate.

The time effect of the second regression is −0.43 + 0.0294 t, which

changes with t. At 2013, the time effect is 0.54, approx. 120 persons.

Since 1994 (about 14.6 years after 1980), there has been a natural

increase of the suicide rate in Taiwan; lowering unemployment rate

would help cancel out the time effect to some extent.

The predicted and actual suicide rates in 2013 are, respectively, 17.86

and 15.3; the difference between them is 2.56, approx. 590 persons.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 60 / 100

Near Multicollinearity

σo2

var(β̂i,T ) = σo2 [x0i (I − Pi )xi ]−1 = PT ,

t=1 (xti − x̄i )2 (1 − R 2 (i))

regressing xi on Xi .

Consequence of near multicollinearity:

R 2 (i) is high, so that var(β̂i,T ) tend to be large and that β̂i,T are

sensitive to data changes.

Large var(β̂i,T ) lead to small (insignificant) t ratios. Yet, regression F

test may suggest that the model (as a whole) is useful.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 61 / 100

How do we circumvent the problems from near multicollinearity?

Adding more data if possible.

Dropping some regressors.

Statistical approaches:

Ridge regression: For some λ 6= 0,

Note: Multicollinearity vs. “micronumerosity” (Goldberger)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 62 / 100

Digression: Regression with Dummy Variables

Example: Let yt denote the wage of the t th individual and xt the working

experience (in years). Consider the following specification:

yt = α0 + α1 Dt + β0 xt + et ,

otherwise. This specification puts together two regressions:

Regression for female: Dt = 0, and intercept is α0 .

Regression for male: Dt = 1, and intercept os α0 + α1 .

These two regressions coincide if α1 = 0. Testing no wage discrimination

against female amounts to testing the hypothesis of α1 = 0.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 63 / 100

We may also consider the specification with a dummy variable and its

interaction with a regressor:

yt = α0 + α1 Dt + β0 xt + β1 (xt Dt ) + et .

Then, the slopes of the regressions for female and male are, respectively,

β0 and β0 + β1 . These two regressions coincide if α1 = 0 and β1 = 0. In

this case, testing no wage discrimination against female amounts to

testing the joint hypothesis of α1 = 0 and β1 = 0.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 64 / 100

Example: Consider two dummy variables:

D1,t = 1 if high school is t’s highest degree and D1,t=0 otherwise;

D2,t = 1 if college or graduate is t’s highest degree and D2,t=0 otherwise.

The specification below in effect puts together 3 regressions:

regression has intercept α0 + α1 , college regression has intercept α0 + α2 .

Similar to the previous example, we may also consider a more general

specification in which x interacts with D1 and D2 .

dummy variables in a model (with the constant term) should be one less

than the number of groups.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 65 / 100

Example: Analysis of Suicide Rate

year of structure change. Consider the specification:

The “before-change” regression has the intercept α0 and slope β0 , and the

“after-change” regression has the intercept α0 + δ and slope β0 + γ.

Testing a structure change at T ∗ amounts to testing δ = 0 and γ = 0

(Chow test).

Alternatively, we can estimate the specification:

st = α0 (1 − Dt ) + α1 Dt + β0 ut−1 (1 − Dt ) + β1 ut−1 Dt + et ,

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 66 / 100

Part I: Estimation results with a known change: Without t

(2.85∗∗ ) (−1.00) (1.20) (0.94) 22.56∗∗

1993 6.07 −1.39 1.75 0.71 0.66 0.17

(2.58∗ ) (−0.49) (1.51) (0.57) 21.86∗∗

1994 5.58 −0.35 1.94 0.39 0.66 0.14

(2.48∗ ) (−0.12) (1.72) (0.32) 21.80∗∗

1995 5.36 0.51 2.02 0.16 0.66 0.26

(2.46∗ ) (0.17) (1.83) (0.13) 22.05∗∗

zero.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 67 / 100

Part II: Estimation results with a known change: With t

(8.30 ) (−5.09 ) (2.52∗ )

∗∗ ∗∗

(−4.00∗∗ ) (5.45∗∗ ) 40.21∗∗

1993 11.11 −9.97 1.05 −0.52 0.90 0.83

(8.47∗∗ ) (−4.79∗∗ ) (2.47∗ ) (−4.42∗∗ ) (5.63∗∗ ) 41.09∗∗

1994 11.05 −9.46 0.97 −0.48 0.85 0.83

(8.42∗∗ ) (−4.41∗∗ ) (2.25∗ ) (−4.48∗∗ ) (5.53∗∗ ) 40.60∗∗

1995 10.92 −8.78 0.89 −0.43 0.79 0.83

∗∗ ∗∗

(8.19 ) (−3.92 ) (1.98) (−4.32∗∗ ) (5.26∗∗ ) 38.98∗∗

F test of the coefficients of Dt and tDt being zero: 15.28∗∗ (’92); 15.83∗∗

(’93); 15.52∗∗ (’94); 14.51∗∗ (’95)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 68 / 100

We do not know T ∗ , the year of change, and hence tried estimating with

different T ∗ :

T ∗ = 1993 : ŝt = 11.11 − 9.97 Dt + 1.05 ut−1 − 0.52 t + 0.90 tDt ;

T ∗ = 1994 : ŝt = 11.05 − 9.46 Dt + 0.97 ut−1 − 0.48 t + 0.85 tDt ;

T ∗ = 1995 : ŝt = 10.92 − 8.78 Dt + 0.89 ut−1 − 0.43 t + 0.79 tDt .

“after-change” slopes are −0.48 (a decrease over time) and 0.37 (an

increase over time), respectively. The marginal effect of ut−1 on st is

significant at 5% level.

The regression line with T ∗ = 1994 predicts the suicide rate in 2013

as 17.8; the difference between the predicted and actual suicide rates

is 2.5, approximately 570 persons.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 69 / 100

Limitation of the Classical Conditions

non-stochastic; also, lagged dependent variables may be used as

regressors.

[A2](i) IE(y) = Xβ o : IE(y) may be a linear function with more

regressors or a nonlinear function of regressors.

[A2](ii) var(y) = σo2 IT : The elements of y may be correlated (serial

correlation, spatial correlation) and/or may have unequal variances.

[A3] Normality: y may have a non-normal distribution.

The OLS estimator loses the properties derived before when some of

the classical conditions fail to hold.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 70 / 100

When var(y) 6= σo2 IT

[A2](i), var(y) = Σo 6= σo2 IT , where Σo is p.d. That is, the elements of y

may be correlated and have unequal variances.

β̂ T is not the BLUE for β o , and it is not the BUE for β o under

normality.

c β̂ T ) = σ̂T2 (X0 X)−1 is a biased estimator for

The estimator var(

var(β̂ T ). Consequently, the t and F tests do not have t and F

distributions, even when y is normally distributed.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 71 / 100

The GLS Estimator

non-stochastic.

GX has full column rank so that the OLS estimator can be computed:

which is still linear and unbiased. It would be the BLUE provided that

G is chosen such that GΣo G0 = σo2 IT .

−1/2 −1/2

Setting G = Σo , where Σo = CΛ−1/2 C0 and C orthogonally

−1/2 −1/20

diagonalizes Σo : C0 Σo C = Λ, we have Σo Σo Σo = IT .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 72 / 100

−1/2 −1/2

With y∗ = Σo y and X∗ = Σo X, we have the GLS estimator:

o X) (X Σo y). (5)

1 ∗ 1

Q(β; Σo ) = (y −X∗ β)0 (y∗ −X∗ β) = (y−Xβ)0 Σ−1

o (y−Xβ).

T T

The vector of GLS fitted values, ŷGLS = X(X0 Σ−1 −1 0 −1

o X) (X Σo y), is

an oblique projection of y onto span(X), because

X(X0 Σ−1 −1 0 −1

o X) X Σo is idempotent but asymmetric. The GLS

residual vector is êGLS = y − ŷGLS .

The sum of squared OLS residuals is less than the sum of squared

GLS residuals. (Why?)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 73 / 100

Stochastic Properties of the GLS Estimator

Given linear specification (1), suppose that [A1] and [A2](i) hold and that

var(y) = Σo is positive definite. Then, β̂ GLS is the BLUE for β o .

o X) .

T 1 1

log L(β; Σo ) = − log(2π)− log(det(Σo ))− (y−Xβ)0 Σ−1

o (y−Xβ),

2 2 2

with the FOC: X0 Σ−1

o (y − Xβ) = 0. Thus, the GLS estimator is also

the MLE under normality.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 74 / 100

Under normality, the information matrix is

IE[X0 Σ−1

o (y − Xβ)(y − Xβ) 0 −1

Σo X] = X0 Σ−1

o X.

β=β o

Thus, the GLS estimator is the BUE for β o , because its covariance

matrix reaches the Crámer-Rao lower bound.

Under the null hypothesis Rβ o = r, we have

o X) R ] (Rβ̂ GLS − r) ∼ χ (q).

Σo is unknown?

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 75 / 100

Under normality, the information matrix is

IE[X0 Σ−1

o (y − Xβ)(y − Xβ) 0 −1

Σo X] = X0 Σ−1

o X.

β=β o

Thus, the GLS estimator is the BUE for β o , because its covariance

matrix reaches the Crámer-Rao lower bound.

Under the null hypothesis Rβ o = r, we have

o X) R ] (Rβ̂ GLS − r) ∼ χ (q).

Σo is unknown?

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 75 / 100

The Feasible GLS Estimator

−1 −1

β̂ FGLS = (X0 Σ

b X)−1 X0 Σ

T

b y,

T

where Σ

b is an estimator of Σ .

T o

Further difficulties in FGLS estimation:

The number of parameters in Σo is T (T + 1)/2. Estimating Σo

without some prior restrictions on Σo is practically infeasible.

Even when an estimator Σ b T is available under certain assumptions,

β̂ FGLS is a complex function of the data y and X. As such, the

finite-sample properties of the FGLS estimator are typically difficult to

derive.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 76 / 100

The Feasible GLS Estimator

−1 −1

β̂ FGLS = (X0 Σ

b X)−1 X0 Σ

T

b y,

T

where Σ

b is an estimator of Σ .

T o

Further difficulties in FGLS estimation:

The number of parameters in Σo is T (T + 1)/2. Estimating Σo

without some prior restrictions on Σo is practically infeasible.

Even when an estimator Σ b T is available under certain assumptions,

β̂ FGLS is a complex function of the data y and X. As such, the

finite-sample properties of the FGLS estimator are typically difficult to

derive.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 76 / 100

The Feasible GLS Estimator

−1 −1

β̂ FGLS = (X0 Σ

b X)−1 X0 Σ

T

b y,

T

where Σ

b is an estimator of Σ .

T o

Further difficulties in FGLS estimation:

The number of parameters in Σo is T (T + 1)/2. Estimating Σo

without some prior restrictions on Σo is practically infeasible.

Even when an estimator Σ b T is available under certain assumptions,

β̂ FGLS is a complex function of the data y and X. As such, the

finite-sample properties of the FGLS estimator are typically difficult to

derive.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 76 / 100

Some Remarks on the Feasible GLS Estimation

some assumptions on Σo . These assumptions are needed to reduce

the number of parameters and hence render the estimation of Σo

feasible.

The subsections below deal with the estimation under some special

assumptions, which are imposed on either the diagonal or off-diagonal

terms of Σo but not both.

Although the assumptions on Σo are weaker than the condition of

Σo = σo2 IT , they are very ad hoc and can not be easily generalized.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 77 / 100

Tests for Heteroskedasticity

A simple form of Σo is

" #

σ12 IT1 0

Σo = ,

0 σ22 IT2

Perform separate OLS regressions using the data in each group and

obtain the variance estimates σ̂T2 1 and σ̂T2 2 .

Under [A1] and [A30 ], the F test is:

,

σ̂T2 1 (T1 − k)σ̂T2 1 (T2 − k)σ̂T2 2

ϕ := 2 = 2 ∼ F (T1 − k, T2 − k).

σ̂T2 σo (T1 − k) σo2 (T2 − k)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 78 / 100

More generally, for some constants c0 , c1 > 0, σt2 = c0 + c1 xtj2 .

The Goldfeld-Quandt test:

(1) Rearrange obs. according to the values of xj in a descending order.

(2) Divide the rearranged data set into three groups with T1 , Tm , and T2

observations, respectively.

(3) Drop the Tm observations in the middle group and perform separate

OLS regressions using the data in the first and third groups.

(4) The statistic is the ratio of the variance estimates:

Some questions:

Can we estimate the model with all observations and then compute σ̂T2 1

and σ̂T2 2 based on T1 and T2 residuals?

If Σo is not diagonal, does the F test above still work?

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 79 / 100

More generally, for some constants c0 , c1 > 0, σt2 = c0 + c1 xtj2 .

The Goldfeld-Quandt test:

(1) Rearrange obs. according to the values of xj in a descending order.

(2) Divide the rearranged data set into three groups with T1 , Tm , and T2

observations, respectively.

(3) Drop the Tm observations in the middle group and perform separate

OLS regressions using the data in the first and third groups.

(4) The statistic is the ratio of the variance estimates:

Some questions:

Can we estimate the model with all observations and then compute σ̂T2 1

and σ̂T2 2 based on T1 and T2 residuals?

If Σo is not diagonal, does the F test above still work?

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 79 / 100

More generally, for some constants c0 , c1 > 0, σt2 = c0 + c1 xtj2 .

The Goldfeld-Quandt test:

(1) Rearrange obs. according to the values of xj in a descending order.

(2) Divide the rearranged data set into three groups with T1 , Tm , and T2

observations, respectively.

(3) Drop the Tm observations in the middle group and perform separate

OLS regressions using the data in the first and third groups.

(4) The statistic is the ratio of the variance estimates:

Some questions:

Can we estimate the model with all observations and then compute σ̂T2 1

and σ̂T2 2 based on T1 and T2 residuals?

If Σo is not diagonal, does the F test above still work?

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 79 / 100

GLS and FGLS Estimation

" #

−1/2 σ1−1 IT1 0

Σo = ,

0 σ2−1 IT2

" # " # " #

y1 /σ1 X1 /σ1 e1 /σ1

= β+ .

y2 /σ2 X2 /σ2 e2 /σ2

−1/2

Clearly, var(Σo y) = IT . The GLS estimator is:

−1

X01 X1 X02 X2 X01 y1 X02 y2

β̂ GLS = + + .

σ12 σ22 σ12 σ22

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 80 / 100

With σ̂T2 1 and σ̂T2 2 from separate regressions, an estimator of Σo is

" #

σ̂T2 1 IT1 0

Σ

b=

2

.

0 σ̂T2 IT2

0 −1 0

X1 X1 X02 X2 X1 y1 X02 y2

β̂ FGLS = + + 2 .

σ̂12 σ̂22 σ̂12 σ̂2

yt 1 xt,j−1 xt,j+1 x e

= βj + β1 + · · · + βj−1 + βj+1 + · · · + βk tk + t ,

xtj xtj xtj xtj xtj xtj

where var(yt /xtj ) = c := σo2 . Here, the GLS estimator is readily computed

as the OLS estimator for the transformed specification.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 81 / 100

Discussion and Remarks

What if the diagonal elements of Σo take multiple values (so that

there are more than 2 groups)?

A general form of heteroskedasticity: σt2 = h(α0 + z0t α1 ), with h

unknown, zt a p × 1 vector and p a fixed number less than T .

When the F test rejects the null of homoskedasticity, groupwise

heteroskedasticity need not be a correct description of Σo .

When the form of heteroskedasticity is incorrectly specified, the

resulting FGLS estimator may be less efficient than the OLS estimator.

The finite-sample properties of FGLS estimators and hence the exact

tests are typically unknown.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 82 / 100

Serial Correlation

When time series data yt are correlated over time, they are said to

exhibit serial correlation. For cross-section data, the correlations of yt

are known as spatial correlation.

A general form of Σo is that its diagonal elements (variances of yt )

are a constant σo2 , and the off-diagonal elements (cov(yt , yt−i )) are

non-zero.

In the time series context, cov(yt , yt−i ) are known as the

autocovariances of yt , and the autocorrelations of yt are

corr(yt , yt−i ) = p = .

σo2

p

var(yt ) var(yt−i )

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 83 / 100

Simple Model: AR(1) Disturbances

mean, variance, and autocovariances are all independent of t.

i.i.d. random variables

White noise: A time series with zero mean, a constant variance, and

zero autocovariances.

Disturbance: := y − Xβ o so that var(y) = var() = IE(0 ).

Suppose that t follows a weakly stationary AR(1) (autoregressive of

order 1) process:

IE(ut uτ ) = 0 for t 6= τ .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 84 / 100

By recursive substitution,

∞

X

t = ψ1i ut−i ,

i=0

stationary process because:

IE(t ) = 0, var(t ) = ∞ 2i 2 2 2

P

i=0 ψ1 σu = σu /(1 − ψ1 ), and

cov(t , t−2 ) = ψ1 cov(t−1 , t−2 ) so that corr(t , t−2 ) = ψ12 . Thus,

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 85 / 100

The variance-covariance matrix var(y) is thus

1 ψ1 ψ12 · · · ψ1T −1

· · · ψ1T −2

ψ1 1 ψ1

Σo = σo2 ψ12

ψ1 1 · · · ψ1T −3 ,

. .. .. .. ..

.. . . . .

T −1 T −2 T −3

ψ1 ψ1 ψ1 ··· 1

with σo2 = σu2 /(1 − ψ12 ). Note that all off-diagonal elements of this matrix

are non-zero, but there are only two unknown parameters.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 86 / 100

−1/2

A transformation matrix for GLS estimation is the following Σo :

1 0 0 ··· 0 0

−√ ψ √ 1

···

1

0 0 0

1−ψ12 1−ψ12

− √ ψ1 2 √ 1 2 · · ·

0 0 0

1 1−ψ1 1−ψ1

.

σo .. .. .. .. .. ..

. . . . . .

1

0 0 0 ··· √ 0

1−ψ12

0 0 0 · · · − √ ψ1 2 √ 1 2

1−ψ1 1−ψ1

−1/2

Any matrix that is a constant proportion to Σo can also serve as a

legitimate transformation matrix for GLS estimation

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 87 / 100

The Cochrane-Orcutt Transformation is based on:

q

1 − ψ12 0 0 ··· 0 0

−ψ 1 0 ··· 0 0

1

0 −ψ1 1 ··· 0 0

q

−1/2 −1/2

Vo = σo 1 − ψ12 Σo = .. .. .. . . .. ..

,

. . . . . .

0 0 0 ··· 1 0

0 0 0 · · · −ψ1 1

−1/2 −1/2

data are: y∗ = Vo y and X∗ = Vo X with

yt∗ = yt − ψ1 yt−1 , x∗t = xt − ψ1 xt−1 , t = 2, · · · , T ,

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 88 / 100

Model Extensions

t = ψ1 t−1 + · · · + ψp t−p + ut ,

MA(1) (moving average of order 1) process:

IE(t ) = 0, var(t ) = (1 + π12 )σu2 .

cov(t , t−1 ) = −π1 σu2 , and cov(t , t−i ) = 0 for i ≥ 2.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 89 / 100

Tests for AR(1) Disturbances

Under AR(1), the null hypothesis is ψ1 = 0. A natural estimator of ψ1 is

the OLS estimator of regressing êt on êt−1 :

PT

êt êt−1

ψ̂T = Pt=2

T

.

2

t=2 êt−1

PT

(ê − êt−1 )2

d = t=2PTt .

2

t=1 êt

PT 2

êt−1 ê 2 + ê 2

d = 2 − 2ψ̂T Pt=2

T

− P1 T T ≈ 2(1 − ψ̂T ).

2 2

t=1 êt t=1 êt

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 90 / 100

For 0 < ψ̂T ≤ 1 (−1 ≤ ψ̂T < 0), 0 ≤ d < 2 (2 < d ≤ 4), there may

be positive (negative) serial correlation. Hence, d essentially checks

whether ψ̂T is “close” to zero (i.e., d is “close” to 2).

Difficulty: The exact null distribution of d holds only under the

classical conditions [A1] and [A3] and depends on the data matrix X.

Thus, the critical values for d can not be tabulated, and this test is

not pivotal.

The null distribution of d lies between a lower bound (dL ) and an

upper bound (dU ):

∗

dL,α < dα∗ < dU,α

∗

.

∗ and d ∗

critical values dL,α U,α can be tabulated.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 91 / 100

Durbin-Watson test:

∗ ∗

(1) Reject the null if d < dL,α (d > 4 − dL,α ).

∗ ∗

(2) Do not reject the null if d > dU,α (d < 4 − dU,α ).

∗ ∗ ∗ ∗

(3) Test is inconclusive if dL,α < d < dU,α (4 − dL,α > d > 4 − dU,α ).

For the specification yt = β1 + β2 xt2 + · · · + βk xtk + γyt−1 + et ,

Durbin’s h statistic is

s

T

h = γ̂T ≈ N (0, 1),

1 − T var(γ̂

c T)

c T ) the OLS estimate of

var(γ̂T ).

Note: var(γ̂

c T ) can not be greater 1/T . (Why?)

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 92 / 100

FGLS Estimation

Vo = V(ψ1 ). Based on V(ψ)−1/2 , we have

yt (ψ) = yt − ψyt−1 , xt (ψ) = xt − ψxt−1 , t = 2, · · · , T .

(1) Perform OLS estimation and compute ψ̂T using the OLS residuals êt .

(2) Perform the Cochrane-Orcutt transformation based on ψ̂T and compute

the resulting FGLS estimate β̂ FGLS by regressing yt (ψ̂T ) on xt (ψ̂T ).

(3) Compute a new ψ̂T with êt replaced by êt,FGLS = yt − x0t β̂ FGLS .

(4) Repeat steps (2) and (3) until ψ̂T converges numerically.

Steps (1) and (2) suffice for FGLS estimation; more iterations may

improve the performance in finite samples.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 93 / 100

Instead of estimating ψ̂T based on OLS residuals, the Hildreth-Lu

procedure adopts grid search to find a suitable ψ ∈ (−1, 1).

compute the resulting FGLS estimate (by regressing yt (ψ) on xt (ψ))

and the ESS based on the FGLS residuals.

Try every ψ on the grid; a ψ is chosen if the corresponding ESS is the

smallest.

The results depend on the grid.

when t follow an AR(p) process with p > 2.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 94 / 100

Application: Linear Probability Model

the linear probability model.

Problems with the linear probability model:

Under [A1] and [A2](i), there is heteroskedasticity:

The OLS fitted values x0t β̂ T need not be bounded between 0 and 1.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 95 / 100

Application: Linear Probability Model

the linear probability model.

Problems with the linear probability model:

Under [A1] and [A2](i), there is heteroskedasticity:

The OLS fitted values x0t β̂ T need not be bounded between 0 and 1.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 95 / 100

An FGLS estimator may be obtained using

h

Σ T 1 T 1 T

i

[x0T β̂ T (1 − x0T β̂ T )]−1/2 .

b −1/2 can not be computed if x0 β̂ is not bounded between 0 and 1.

Σ T t T

−1/2

Even when ΣT b is available, there is no guarantee that the FGLS

fitted values are bounded between 0 and 1.

The finite-sample properties of the FGLS estimator are unknown.

A key issue: A linear model here fails to take into account data

characteristics.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 96 / 100

Application: Seemingly Unrelated Regressions

system of N equations, each with ki explanatory variables and T obs:

y i = Xi β i + ei , i = 1, 2, . . . , N.

y1 X1 0 · · · 0 β1 e1

y2 0 X2 · · · 0 β2 e2

.. = .. .. . . .. .. + .. .

. . . . . . .

yN 0 0 · · · XN βN eN

| {z } | {z } | {z } | {z }

y X β e

PN PN

where y is TN × 1, X is TN × i=1 ki , and β is i=1 ki × 1.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 97 / 100

Suppose yit and yjt are contemporaneously correlated, but yit and yjτ

are serially uncorrelated, i.e., cov(yi , yj ) = σij IT .

For this system, Σo = So ⊗ IT with

σ12 σ12 · · · σ1N

σ21 σ22 · · · σ2N

So = .. .. .. .. ;

. . . .

σN1 σN2 · · · σN2

that is, the SUR system has both serial and spatial correlations.

As Σ−1 −1

o = So ⊗ IT , then

o ⊗ IT )X] X (So ⊗ IT )y,

o ⊗ IT )X] .

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 98 / 100

Remarks:

When σij = 0 for i 6= j, So is diagonal, and so is Σo . Then, the GLS

estimator for each β i reduces to the corresponding OLS estimator, so

that joint estimation of N equations is not necessary.

If all equations in the system have the same regressors, i.e., Xi = X0

(say) and X = IN ⊗ X0 , the GLS estimator is also the same as the OLS

estimator.

More generally, there would not be much efficiency gain for GLS

estimation if yi and yj are less correlated and/or Xi and Xj are highly

correlated.

The FGLS estimator can be computed as

b−1 ⊗ I )X]−1 X0 (S

β̂ FGLS = [X0 (S b−1 ⊗ I )y.

TN T TN T

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 99 / 100

S TN is an N × N matrix:

b

ê01

ê02

h

1 i

STN =

b .. ê ê . . . ê

1 2 N ,

T

.

ê0N

The estimator S 2

TN is valid provided that var(yi ) = σi IT and

b

cov(yi , yj ) = σij IT . Without these assumptions, FGLS estimation

would be more complicated.

Again, the finite-sample properties of the FGLS estimator are

unknown.

C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014 100 / 100

- System Methods with RTransféré parVladimir Milkhailov
- et_Ch3Transféré parRyan Taga
- Research Proposal for MBATransféré parAndy Phuong
- highbrow_films_gather_dust-time_inconsistent_preferences_and_online_dvd_rentals.pdfTransféré parRalphYoung
- Refcard RmsTransféré parssheldon_222
- art%3A10.1023%2FB%3AEMPI.0000005240.01953.e0Transféré parthanhkhoaqt1b
- 6 Basic EconometricsTransféré parsabyrbekov
- Linear Regression ModelTransféré parHedeto M Dato
- Multiple Regression AnalysisTransféré parJohniel ⎝⎲⎲⎠⎝⓿⏝⓿⎠ Babiera
- Lec6-2019Transféré parchanlego123
- Sushmita ChatterjeeTransféré parsbos1
- Estimating Project Development Effort Using Clustered Regression ApproachTransféré parCS & IT
- Chapter 3Transféré parlight247_1993
- Data Envelopment Analysis as Nonparametric Least Squares RegressionTransféré parDavid Adeabah Osafo
- Analysis 1Transféré parAditya Kumar Konathala
- Parities and Spread Trading in Gold and Silver MarketssTransféré parSanchit Gupta
- Predicting Quarterly AggregatesTransféré paralbi_12291214
- Capital Accumulation and Growth - A New Look at the Empirical EvidenceTransféré parAdriana Curca
- Corre-RegreTransféré parBharat Kul Ratan
- WHO'S IN, WHO'S OUT? EVALUATING THE ECONOMIC AND SOCIAL IMPLICATIONS OF PARTICIPATION IN CLEAN VEHICLE REBATE PROGRAMSTransféré pardana
- lec16 DEATransféré parHardik Gupta
- Bernd Wegener. Job Mobility and Social Ties: Social Resources, Prior Job, and Status AttainmentTransféré parAna María Osorio
- dp326Transféré parFlaviub23
- Autor Dorn 2009Transféré parRubén Ananías
- Lect Notes 8Transféré parSafis Hajjouz
- gauss_2003Transféré parVictor Băran
- dp3322.pdfTransféré parPharadaiz Daedalus
- MINITAB Which is BetterTransféré parRoberto Carlos Chuquilín Goicochea
- Regression using spssTransféré parArun Rajan
- c15 Dynamic Causal EffectsTransféré parArjine Tang

- Noise, Image Reconstruction With NoiseTransféré parAnonymous tsTtieMHD
- A Survey of Multilinear Subspace Learning for Tensor DataTransféré parAnonymous tsTtieMHD
- Nonlinear Least Squares Theory - Lecture NotesTransféré parAnonymous tsTtieMHD
- Image Blending and CompositingTransféré parAnonymous tsTtieMHD
- Nonlinear Least Squares TheoryTransféré parAnonymous tsTtieMHD
- Elements of Probability Theory - Lecture NotesTransféré parAnonymous tsTtieMHD
- Lec-TimeSeries Slide Fall2012Transféré parsdssd sds
- Asymptotic Least Squares Theory - Lecture NotesTransféré parAnonymous tsTtieMHD
- Quasi Maximum Likelihood Theory - Lecture NotesTransféré parAnonymous tsTtieMHD
- Seamless Image Stitching in the Gradient DomainTransféré parAnonymous tsTtieMHD
- Asymptotic Least Squares Theory - Part ITransféré parAnonymous tsTtieMHD
- Lecture on Bootstrap - Lecture NotesTransféré parAnonymous tsTtieMHD
- Quasi Maximum Likelihood - ApplicationsTransféré parAnonymous tsTtieMHD
- Algorithms for Efficient Computation of ConvolutionTransféré parAnonymous tsTtieMHD
- The Structure of ImagesTransféré parAnonymous tsTtieMHD
- Duality Based Algorithms for Total Variation Regularized Image RestorationTransféré parAnonymous tsTtieMHD
- Statistical ConceptsTransféré parAnonymous tsTtieMHD
- Generalized Least Squares TheoryTransféré parAnonymous tsTtieMHD
- ch9 QMLTransféré parmintajvshokolade
- Dual Methods for the Minimization of the Total VariationTransféré parAnonymous tsTtieMHD
- Classical Least Squares TheoryTransféré parAnonymous tsTtieMHD
- Scale Space and Edge Detection Using Anisotropic DiffusionTransféré parAnonymous tsTtieMHD
- Elements of Probability TheoryTransféré parAnonymous tsTtieMHD
- Dp Tutorial 2Transféré parjustdls

- Florovsky 1959 Palamas and the FathersTransféré parheliotropion333
- Code Switching SpeechTransféré parAhlung Gunawan
- Theo 121Transféré parGuelan Luarca
- Purnanandalahari p1d4Transféré parवाधूलं कण्णन् कुमारस्वामि
- Lesson Plan for Writing a Fictional Narrative 1st GradeTransféré parMelina Arriaza
- Cisco Unity Connection 9.0(1)Transféré parЛима Дамски
- being, univocity and logical syntax - Adrian MooreTransféré parRui Mascarenhas
- Dormancy Timer.pdfTransféré parkrishnac1222
- 329 mlp finalTransféré parapi-352198592
- Chapter 0_Mazidi's book (common material for all µC books).pdfTransféré parSanam Nisar
- gail mccabe thesisTransféré parapi-234077090
- AUTISM - Research Based Educational PracticeTransféré parmeowtzy
- dlp DTTransféré parElean Joy Reyes
- Mla 2009Transféré parRiris Nurul Awaliyah
- Denton, Sanskrit GrammarTransféré parimg
- The Parsee Religion (by Dadabhai Naoroji)Transféré parsomebody535
- System Platform 2012 - System RequirementsTransféré parbhornyak001
- Schelling's Critic of Hegel's Science of LogicTransféré parAlan Ag
- Alexopolou.nostos.homer.seferisTransféré parMarcelo Alves
- Introductory Course-French Without TearsTransféré parGarima
- Isl WEEK 6 Reading Strategies and TechniquesTransféré parFariza Ezlin Zamri
- itextTransféré parkaviarasu
- citation-style-language.pdfTransféré parvandersonpc
- Upsr 2017 Bahasa Inggeris Penulisan SkTransféré par慈心
- cowsayTransféré parIvar Garcia
- 3.2.4.6 Packet Tracer - Investigating the TCP-IP and OSI Models in Action InstructionsTransféré parMichelle Yu
- genre analysisTransféré parapi-252218826
- ZWCAD vs. AutoCAD 2010 comparisonTransféré parjuantrillos
- Gehrkens Music Notation & Terminology 1921.pdfTransféré parg5249470
- JOCN_template_2010.docTransféré parprabhman88