Vous êtes sur la page 1sur 45

Regression Analysis

Lecture 6: Regression Analysis


MIT 18.S096
Dr. Kempthorne

Fall 2013

MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Multiple Linear Regression: Setup


Data Set
n cases i = 1, 2, . . . , n
1 Response (dependent) variable
yi , i = 1, 2, . . . , n
p Explanatory (independent) variables
xi = (xi,1 , xi,2 , . . . , xi,p )T , i = 1, 2, . . . , n
Goal of Regression Analysis:
Extract/exploit relationship between yi and xi .
Examples
Prediction
Causal Inference
Approximation
Functional Relationships
MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

General Linear Model: For each case i, the conditional


distribution [yi | xi ] is given by
yi = yi + i
where
yi = 1 xi,1 + 2 xi,2 + + i,p xi,p
= (1 , 2 , . . . , p )T are p regression parameters
(constant over all cases)
i Residual (error) variable
(varies over all cases)
Extensive breadth of possible models
Polynomial approximation (xi,j = (xi )j , explanatory variables are different
powers of the same variable x = xi )
Fourier Series: (xi,j = sin(jxi ) or cos(jxi ), explanatory variables are different
sin/cos terms of a Fourier series expansion)
Time series regressions: time indexed by i, and explanatory variables include
lagged response values.
Note: Linearity of yi (in regression parameters) maintained with non-linear x.
MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Steps for Fitting a Model


(1) Propose a model in terms of
Response variable Y (specify the scale)
Explanatory variables X1 , X2 , . . . Xp (include different
functions of explanatory variables if appropriate)
Assumptions about the distribution of  over the cases
(2) Specify/define a criterion for judging different estimators.
(3) Characterize the best estimator and apply it to the given data.
(4) Check the assumptions in (1).
(5) If necessary modify model and/or assumptions and go to (1).

MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Specifying Assumptions in (1) for Residual Distribution


Gauss-Markov: zero mean, constant variance, uncorrelated
Normal-linear models: i are i.i.d. N(0, 2 ) r.v.s
Generalized Gauss-Markov: zero mean, and general covariance
matrix (possibly correlated,possibly heteroscedastic)
Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto,
Contaminated normal: some fraction (1 ) of the i are i.i.d.
N(0, 2 ) r.v.s the remaining fraction () follows some
contamination distribution).

MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Specifying Estimator Criterion in (2)


Least Squares
Maximum Likelihood
Robust (Contamination-resistant)
Bayes (assume j are r.v.s with known prior distribution)
Accommodating incomplete/missing data
Case Analyses for (4) Checking Assumptions
Residual analysis
Model errors i are unobservable
Model residuals for fitted regression parameters j are:
ei = yi [1 xi,1 + 2 xi,2 + + p xi,p ]

Influence diagnostics (identify cases which are highly


influential ?)
Outlier detection
MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096

Regression Analysis

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Ordinary Least Squares Estimates


Least Squares Criterion: For = (1 , 2 , . . . , p )T , define
PN
Q() =
i ]2
i=1 [yi y
PN
2
=
i=1 [yi (1 xi,1 + 2 xi,2 + + i,p xi,p )]
: minimizes Q().
Ordinary Least-Squares (OLS) estimate
Matrix Notation

y1
x1,1 x1,2
y2
x2,1 x2,2

y= . X= .
..
..
..
..
.
.
yn
xn,1 xn,2
MIT 18.S096

x1,p
x2,p
.
..
xp,n

Regression Analysis

1
..
.
p
9

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Solving for OLS Estimate

y=

Q() =

y1
y2
..
.

= X and

yn
Pn

i=1 (yi

yi )2 = (y
y)T (y
y)

= (y X)T (y X)
Q()
OLS solves =0, j = 1, 2, . . . , p
j

Pn
Q()

2
[y

(x

+
x

)]
=
1
2
p
i
i,1
i,2
i,p
i=1

j
Pjn
=
i=1 2(xi,j )[yi (xi,1 1 + xi,2 2 + xi,p p )]
= 2(X[j] )T (y X) where X[j] is the jth column of X
MIT 18.S096

Regression Analysis

10

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Solving for OLS Estimate

Q
1
Q
2

..
.

Q
p

= 2

solves
So the OLS Estimate
T
X (y X)

XT X

XT
[1] (y X)
XT
[2] (y X)
..
.
XT
[p] (y X)
the
=
=
=

= 2XT (y X)

Normal Equations
0
XT y
(XT X)1 XT y

to exist (uniquely)
N.B. For
(XT X) must be invertible

X must have Full Column Rank


MIT 18.S096

Regression Analysis

11

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(Ordinary) Least Squares Fit


OLS Estimate:

y =

y1
y2
..
.
yn

1
2
.. =
.
p

(XT X)1 XT y Fitted Values:

x1,1 1 + + x1,p p
x2,1 1 + + x2,p p
..
.
xn,1 1 + + xn,p p

= X(XT X)1 XT y = Hy
= X
Where

H = X(XT X)1 XT is the n n Hat Matrix


MIT 18.S096

Regression Analysis

12

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(Ordinary) Least Squares Fit


The Hat Matrix H projects R n onto the column-space of X
Residuals: i = yi yi , i = 1, 2, . . . , n

 =

1
2
..
.

= y y = (In H)y

n

0
.
) = XT  = 0p =
Normal Equations: XT (y X
..
0
N.B. The Least-Squares Residuals vector  is orthogonal to the
column space of X
MIT 18.S096

Regression Analysis

13

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096

Regression Analysis

14

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem: Assumptions

x1,1 x1,2 x1,p

x2,1 x2,2 x2,p

Data y =
and X = ..
..
..
.
.

.
.
.
.
yn
xn,1 xn,2 xp,n
follow a linear model satisfying the Gauss-Markov Assumptions
if y is an observation of random vector Y = (Y1 , Y2 , . . . YN )T and

y1
y2
..
.

E (Y | X, ) = X, where = (1 , 2 , . . . p )T is the
p-vector of regression parameters.

Cov (Y | X, ) = 2 In , for some 2 > 0.


I.e., the random variables generating the observations are
uncorrelated and have constant variance 2 (conditional on X,
and ).
MIT 18.S096

Regression Analysis

15

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem
For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of
estimating
= c1 1 + c2 2 + cp p + cp+1 .
Under the Gauss-Markov assumptions, the estimator
= c1 1 + c2 2 + cp p + cp+1 ,

where 1 , 2 , . . . p are the least squares estimates is


1) An Unbiased Estimator of
2) A Linear Estimator
of , that is
P
= ni=1 bi yi , for some known (given X) constants bi .
Theorem: Under the Gauss-Markov Assumptions, the estimator
has the smallest (Best) variance among all Linear Unbiased
Estimators of , i.e., is BLUE .
MIT 18.S096

Regression Analysis

16

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Gauss-Markov Theorem: Proof


Proof: Without loss of generality, assume cp+1 = 0 and
define c =(c1 , c2 , . . . , cp )T .
The Least Squares Estimate of = cT is:
= cT (XT X)1 XT y dT y
= cT
a linear estimate in y given by coefficients d = (d1 , d2 , . . . , dn )T .
Consider an alternative linear estimate of :
= bT y
with fixed coefficients given by b = (b1 , . . . , bn )T .
Define f = b d and note that
= bT y = (d + f)T y = + f T y
If is unbiased then because is unbiased
0 = E (f T y) = dT E (y) = f T (X) for all R p
= f is orthogonal to column space of X
= f is orthogonal to d = X(XT X)1 c
MIT 18.S096

Regression Analysis

17

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

If is unbiased then
The orthogonality of f to d implies
Var () =
=
=
=
=
=

Var (bT y) = Var (dT y + f T y)


Var (dT y) + Var (f T y) + 2Cov (dT y, f T y)
Var () + Var (f T y) + 2dT Cov (y)f
Var () + Var (f T y) + 2dT ( 2 In )f
Var () + Var (f T y) + 2 2 dT f
Var () + Var (f T y) + 2 2 0
Var ()

MIT 18.S096

Regression Analysis

18

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096

Regression Analysis

19

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Generalized Least Squares (GLS) Estimates


Consider generalizing the Gauss-Markov assumptions for the linear
regression model to
Y = X + 
where the random n-vector : E [] = 0n and E [0 ] = 2 .
2 is an unknown scale parameter
is a known (n n) positive definite matrix specifying the
relative variances and correlations of the component
observations.
1
1
Transform the data (Y, X) to Y = 2 Y and X = 2 X and
the model becomes
Y = X +  , where E [ ] = 0n and E [ ( )0 ] = 2 In
By the Gauss-Markov Theorem, the BLUE (GLS) of is
= [(X )T (X )]1 (X )T (Y ) = [XT 1 X]1 (XT 1 Y)
MIT 18.S096

Regression Analysis

20

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096

Regression Analysis

21

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Normal Linear Regression Models


Distribution Theory
Yi = xi,1 1 + xi,2 2 + xi,p p + i
= i + i
Assume {1 , 2 , . . . , n } are i.i.d N(0, 2 ).
= [Yi | xi,1 , xi,2 , . . . , xi,p , , 2 ] N(i , 2 ),
independent over i = 1, 2, . . . n.
Conditioning on X, , and 2

Y = X + , where  =

1
2
..
.

Nn (On , 2 In )

n
MIT 18.S096

Regression Analysis

22

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Distribution Theory

= ... = E (Y | X, , 2 ) = X
n

MIT 18.S096

Regression Analysis

23

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Regression Analysis

= Cov (Y | X, , 2 ) =

2
0
0
...

0 0
2 0
0 2
...

..

0
0
0
...
2

= 2 In

That is, i,j = Cov (Yi , Yj | X, , 2 ) = 2 i,j .


Apply Moment-Generating Functions (MGFs) to derive
Joint distribution of Y = (Y1 , Y2 , . . . , Yn )T
= (1 , 2 , . . . , p )T .
Joint distribution of

MIT 18.S096

Regression Analysis

24

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of Y
For the n-variate r.v. Y, and constant nvector t = (t1 , . . . , tn )T ,
MY (t) =
=
=
=
=

E (e t Y ) = E (e t1 Y1 +t2 Y2 +tn Yn )
E (e t1 Y1 ) E (e t2 Y2 ) E (e tn Yn )
MY1 (t1 ) MY2 (t2 ) MYn (tn )
Qn
ti i + 21 ti2 2
i =1 e
P
n
1 Pn
1 T
T
e i=1 ti i + 2 i,k=1 ti i,k tk = e t u+ 2 t t

= Y Nn (, )
Multivariate Normal with mean and covariance

MIT 18.S096

Regression Analysis

25

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Regression Analysis

MGF of
, and constant pvector = (1 , . . . , p )T ,
For the p-variate r.v.
M ( ) = E (e

) = E (e 1 1 +2 2 +p p )

Defining A = (XT X)1 XT we can express


= (XT X)1 XT y = AY

and
M ( ) =
=
=
=
=

E (e )
T
E (e AY )
T
E (e t Y ), with t = AT
MY (t)
1 T
T
e t u+ 2 t t
MIT 18.S096

Regression Analysis

26

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MGF of
For
M ( ) = E (e
= e

tT u+ 12 tT t

Plug in:
t = AT = X(XT X)1
= X
= 2 In
Gives:

tT = T
tT t = T (XT X)1 XT [ 2 In ]X(XT X)1
= T [ 2 (XT X)1 ]
is
So the MGF of
1 T 2
T
T
1
M ( ) = e + 2 [ (X X) ]
T Regression
2
1 Analysis
MIT 18.S096

27

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Marginal Distributions of Least Squares Estimates


Because
Np (, 2 (XT X)1 )

the marginal distribution of each j is:


j N(j , 2 Cj,j )
where Cj.j = jth diagonal element of (XT X)1

MIT 18.S096

Regression Analysis

28

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

The Q-R Decomposition of X


Consider expressing the (n p) matrix X of explanatory variables
as
X=QR
where
Q is an (n p) orthonormal matrix, i.e., QT Q = Ip .
R is a (p p) upper-triangular matrix.
The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by
performing the Gram-Schmidt Orthonormalization procedure on
the columns of X = [X[1] , X[2] , . . . , X[p] ]

MIT 18.S096

Regression Analysis

29

Regression Analysis

If

X[1]
=

r1,1 r1,2
0
r2,2

..
R= 0
.
0

0
0
0
0

= Q[1] r1,1

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

r1,p1
r2,p1
...

r1,p
r2,p
...

rp1,p1 rp1,p
0
rp,p

, then

2
r1,1
= XT
[1] X[1]
Q[1] = X[1] /r1,1

X[2] = Q[1] r1,2 + Q[2] r2,2


=
T
T
T
Q[1]
X[2] = Q[1]
Q[1] r1,2 + Q[1]
Q[2] r2,2
= 1 r1,2 + 0 r2,2
= r1,2 (known since Q[1] specfied)
MIT 18.S096

Regression Analysis

30

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

With r1,2 and Q[1] specfied we can solve for r2,2 :


=
Q[2] r2,2 = X[2] Q[1] r1,2
Take squared norm of both sides:
T
2 = XT X
2
r2,2
[2] [2] 2r1,2 Q[1] X[2] + r1,2
(all terms on RHS are known)
With r2,2 specified
=


1
Q[2] = r2,2
X[2] r1,2 Q[1]
Etc. (solve for elements of R, and columns of Q)
MIT 18.S096

Regression Analysis

31

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

With the Q-R Decomposition


X = QR
(QT Q = Ip , and R is p p upper-triangular)
= (XT X)1 XT y = R1 QT y

(plug in X = QR and simplify)


= 2 (XT X)1 = 2 R1 (R1 )T
Cov ()
H = X(XT X)1 XT = QQT
(giving y = Hy and  = (In H)y)

MIT 18.S096

Regression Analysis

32

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

More Distribution Theory


Assume y = X + , where {i } are i.i.d. N(0, 2 ), i.e.,
 Nn (0n , 2 In )
or y Nn (X, 2 In )
Theorem* For any (m n) matrix A of rank m n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z Nm (z , z )
where
z = AE (y) = AX,
and
z = ACov (y)AT = 2 AAT .
= Ay
Earlier, A = (XT X)1 XT yields the distribution of
With a different definition of A (and z) we give an easy proof of:
MIT 18.S096

Regression Analysis

33

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Theorem For the normal linear regression model


y = X + ,
where
X (n p) has rank p and
 Nn (0n , 2 In ).
= (XT X)1 XT y and  = y X
are independent r.v.s
(a)
N (, 2 (XT X)1 )
(b)
Pn p2
(c)
i = T  2 2np (Chi-squared r.v.)
i=1 
(d) For each j = 1, 2, . . . , p
tj =
where

j j

Cj,j tnp (t distribution)


1

Pn

2 = np
2i
i=1 
Cj,j = [(XT X)1 ]j,j
MIT 18.S096

Regression Analysis

34

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Proof: Note that (d) follows immediately from (a), (b), (c)

QT
Define A =
WT



, where

A is an (n n) orthogonal matrix (i.e. AT = A1 )


Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X = [ X In ].
Then, consider
 T  

Q y
zQ
(p 1)
z = Ay =
=
T
(n p) 1
z
W y
W
MIT 18.S096

Regression Analysis

35

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

The distribution of z = Ay is Nn (z , z )
where
 T 

Q
z = [A][X] =
[Q R ]
WT
 T

Q Q
=
[R ]
T
 W Q 
Ip
=
[R ]
0
 (np)p 
R
=
0(np)p
z = A [ 2 In ] AT = 2 [AAT ] = 2 In
since AT = A1
MIT 18.S096

Regression Analysis

36

Regression Analysis


Thus z =

zQ
zW


Nn

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

R
Onp

, In

zQ Np [(R), 2 Ip ]
zW N(np) [(O(np) , 2 I(np) ]
and
zQ and zW are independent.
The Theorem follows by showing
= R1 zQ and  = WzW ,
(a*)
and  are functions of different independent vecctors).
(i.e.
= R1 zQ ,
(b*) Deducing the distribution of
1
applying Theorem* with A = R and y = zQ
(c*) T  = zW T zW
= sum of (n p) squared r.vs which are i.i.d. N(0, 2 ).
2
2 (np)
, a scaled Chi-Squared r.v.
MIT 18.S096

Regression Analysis

37

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Proof of (a*)
= R1 zQ follows from

= (XT X)1 Xy and

X = QR with Q : QT Q = Ip
 =
=
=
=
=

= y (QR) (R1 zQ )
y y = y X
y QzQ
y QQT y = (In QQT )y
WWT y (since In = AT A = QQT + WWT )
WzW

MIT 18.S096

Regression Analysis

38

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096

Regression Analysis

39

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = X + , where {i } are i.i.d. N(0, 2 ), i.e.,
 Nn (0n , 2 In )
or y Nn (X, 2 In )
Definitions:
The likelihood function is
L(, 2 ) = p(y | X, B, 2 )
where p(y | X, B, 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (, 2 ) (unknown).
The maximum likelihood estimates of (, 2 ) are the values
maximizing L(, 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
MIT 18.S096

Regression Analysis

40

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

2
Because
Ppthe yi are independent r.v.s with yi N(i , ) where
i = j=1 j xi,j ,
Qn
L(, 2 ) =
p (yi | , 2 )
i
P
Qni=1 h 1
1 2 (yi j=1 j xi,j )2

2
=
e
i=1
2
2
12 (yX)T ( 2 In )1 (yX)
1
e
(2 2 )n/2
,
likelihood estimates (
2 ) maximize the

The maximum
log-likeliood function (dropping constant terms)
logL(, 2 ) = n2 log ( 2 ) 12 (y X)T ( 2 In )1 (y X)
= n2 log ( 2 ) 21 2 Q ()
where Q() = (y X)T (y X) ( Least-Squares Criterion!)
is also the ML-estimate.
The OLS estimate
The ML estimate
of 2 solves
2
log L( , )
( 2 )

2 =
= ML

) = 0
= 0 ,i.e., n2 12 21 (1)( 2 )2 Q(
P
)/n = ( n 2 )/n (biased!)
Q(
i=1 i
MIT 18.S096

Regression Analysis

41

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Outline

Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

MIT 18.S096

Regression Analysis

42

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

Generalized M Estimation
For data y, X fit the linear regression model
y i = xT
i + i , i = 1, 2, . . . , n.
to minimize
by specifying =P

Q() = ni=1 h(yi , xi , , 2 )


The choice of the function h( ) distinguishes different estimators.
2
(1) Least Squares: h(yi , xi , , 2 ) = (yi xT
i )
(2) Mean Absolue Deviation (MAD): h(yi , xi , , 2 ) = |yi xiT |
(3) Maximum Likelihood (ML): Assume the yi are independent
with pdfs p(yi | , xi , 2 ),
h(yi , xi , , 2 ) = log p(yi | , xi , 2 )
(4) Robust MEstimator: h(yi , xi , , 2 ) = (yi xT
i )
( ) is even, monotone increasing on (0, ).
MIT 18.S096

Regression Analysis

43

Regression Analysis

Linear Regression: Overview


Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation

(5) Quantile Estimator: For : 0 < < 1, a fixed quantile


|yi xT
i |, if yi xi
h(yi , xi , , 2 ) =
(1 )|yi xT
i |, if yi < xi
E.g., = 0.90 corresponds to the 90th quantile /
upper-decile.
= 0.50 corresponds to the MAD Estimator

MIT 18.S096

Regression Analysis

44

MIT OpenCourseWare
http://ocw.mit.edu

18.S096 Topics in Mathematics with Applications in Finance


Fall 2013

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Vous aimerez peut-être aussi