Académique Documents
Professionnel Documents
Culture Documents
MIT 18.S096
Dr. Kempthorne
Fall 2013
Outline
1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Matrix Notation
y1 x1,1 x1,2 · · · x1,p
y2 x2,1 x2,2 · · · β1
x2,p
..
y= . X= . β=
.. .. . .
.. .. . . ..
βp
yn xn,1 xn,2 · · · xp,n
= (y − Xβ)T (y − Xβ)
∂Q(β)
OLS β̂ solves ∂βj =0, j = 1, 2, . . . , p
∂Q(β) ∂ Pn 2
∂βj = ∂βj i=1 [yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
Pn
= i=1 2(−xi,j )[yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
= −2(X[j] )T (y − Xβ) where X[j] is the jth column of X
0
N.B. The Least-Squares Residuals vector ˆ is orthogonal to the
column space of X
MIT 18.S096 Regression Analysis 13
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Gauss-Markov Theorem
For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of
estimating
θ = c1 β1 + c2 β2 + · · · cp βp + cp+1 .
Under the Gauss-Markov assumptions, the estimator
θ̂ = c1 βˆ1 + c2 β̂2 + · · · cp βˆp + cp+1 ,
where β̂1 , β̂2 , . . . βˆp are the least squares estimates is
1) An Unbiased Estimator of θ
2) A Linear Estimator of θ, that is
θ̂ = ni=1 bi yi , for some known (given X) constants bi .
P
If θ̃ is unbiased then
The orthogonality of f to d implies
Outline
1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Conditioning on X, β, and σ 2
1
2
Y = Xβ + , where = ∼ Nn (On , σ 2 In )
..
.
n
Distribution Theory
µ1
µ = ... = E (Y | X, β, σ 2 ) = Xβ
µn
σ2
0 0 ··· 0
0 σ2 0 ··· 0
Σ = Cov (Y | X, β, σ 2 ) =
0 0 σ2 0 = σ 2 In
... ... ..
. ...
0 0 ··· σ2
That is, Σi,j = Cov (Yi , Yj | X, β, σ 2 ) = σ 2 × δi,j .
MGF of Y
For the n-variate r.v. Y, and constant n−vector t = (t1 , . . . , tn )T ,
T
MY (t) = E (e t Y ) = E (e t1 Y1 +t2 Y2 +···tn Yn )
= E (e t1 Y1 ) · E (e t2 Y2 ) · · · E (e tn Yn )
= MY1 (t1 ) · MY2 (t2 ) · · · MYn (tn )
Qn ti µi + 21 ti2 σ 2
= i =1 e
P n 1 Pn T 1 T
= e i=1 ti µi + 2 i,k=1 ti Σi,k tk = e t u+ 2 t Σt
=⇒ Y ∼ Nn (µ, Σ)
Multivariate Normal with mean µ and covariance Σ
MGF of β̂
For the p-variate r.v. β̂, and constant p−vector τ = (τ1 , . . . , τp )T ,
T β̂ ˆ ˆ
Mβ̂ (τ ) = E (e τ ) = E (e τ1 β1 +τ2 β2 +···τp βp )
MGF of β̂
For
T β̂
Mβ̂ (τ ) = E (e τ )
tT u+ 12 tT Σt
= e
Plug in:
t = AT τ = X(XT X)−1 τ
µ = Xβ
Σ = σ 2 In
Gives:
tT µ = τ T β
tT Σt = τ T (XT X)−1 XT [σ 2 In ]X(XT X)−1 τ
= τ T [σ 2 (XT X)−1 ]τ
So the MGF of β̂ is
T 1 T 2 T −1
Mβ̂ (τ ) = e τ β+ 2 τ [σ (X X) ]τ
ˆ 2
MIT 18.S096 T Regression
−1 Analysis 27
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Because
∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Theorem* For any (m × n) matrix A of rank m ≤ n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z ∼ Nm (µz , Σz )
where µz = AE (y) = AXβ,
and Σz = ACov (y)AT = σ 2 AAT .
Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay
With a different definition of A (and z) we give an easy proof of:
MIT 18.S096 Regression Analysis 33
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Proof: Note that (d) follows immediately from (a), (b), (c)
QT
Define A = , where
WT
A is an (n × n) orthogonal matrix (i.e. AT = A−1 )
Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X∗ = [ X In ].
Then, consider T
Q y zQ (p × 1)
z = Ay = T =
W y z W (n − p) × 1
MIT 18.S096 Regression Analysis 35
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Proof of (a*)
β̂ = R−1 zQ follows from
β̂ = (XT X)−1 Xy and
X = QR with Q : QT Q = Ip
Outline
1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,
∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Definitions:
The likelihood function is
L(β, σ 2 ) = p(y | X, B, σ 2 )
where p(y | X, B, σ 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (β, σ 2 ) (unknown).
The maximum likelihood estimates of (β, σ 2 ) are the values
maximizing L(β, σ 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
MIT 18.S096 Regression Analysis 40
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
2
Ppthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where
Because
µi = j=1 βj xi,j ,
Qn
L(β, σ 2 ) = p (yi | β, σ 2 )
Qni=1 h 1 − 1 2 (yi − j=1 βj xi,j )2
P i
= i=1
√
2
e 2σ
2πσ
1 − 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ)
= (2πσ 2 )n/2
e
The maximum likelihood estimates (β̂, σ̂ 2 ) maximize the
log-likeliood function (dropping constant terms)
logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ)
= − n2 log (σ 2 ) − 2σ1 2 Q (β)
where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!)
The OLS estimate β̂ is also the ML-estimate.
The ML estimate 2
of σ 2 solves
∂log L(β̂ ,σ )
∂(σ 2 )
= 0 ,i.e., − n2 σ12 − 21 (−1)(σ 2 )−2 Q(β̂) = 0
=⇒ σML ˆ
2 = Q(β̂)/n = ( ni=1 ˆ2i )/n (biased!)
P
Outline
1 Regression Analysis
Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Generalized M Estimation
For data y, X fit the linear regression model
y i = xT
i β + i , i = 1, 2, . . . , n.
by specifying β =P β̂ to minimize
Q(β) = ni=1 h(yi , xi , β, σ 2 )
The choice of the function h( ) distinguishes different estimators.
(1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xT
i β)
2
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.