Free Mit

Regression Analysis
Lecture 6: Regression Analysis
MIT 18.S096
Dr. Kempthorne
Fall 2013
MIT 18.S096 Regression Analysis 1

Linear Regression: Overview
Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Regression Analysis Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
1 Regression Analysis
Generalized Least Squares (GLS)

Multiple Linear Regression: Setup

Data Set
n cases i = 1, 2, . . . , n
1 Response (dependent) variable
yi , i = 1, 2, . . . , n
p Explanatory (independent) variables
xi = (xi,1 , xi,2 , . . . , xi,p )T , i = 1, 2, . . . , n
Goal of Regression Analysis:
Extract/exploit relationship between yi and xi .
Examples
Prediction
Causal Inference
Approximation
Functional Relationships
General Linear Model: For each case i, the conditional

distribution [yi | xi ] is given by
yi = ŷi + i
where
ŷi = β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p
β = (β1 , β2 , . . . , βp )T are p regression parameters
(constant over all cases)
i Residual (error) variable
(varies over all cases)
Extensive breadth of possible models
Polynomial approximation (xi,j = (xi )j , explanatory variables are different
powers of the same variable x = xi )
Fourier Series: (xi,j = sin(jxi ) or cos(jxi ), explanatory variables are different
sin/cos terms of a Fourier series expansion)
Time series regressions: time indexed by i, and explanatory variables include
lagged response values.
Note: Linearity of ŷi (in regression parameters) maintained with non-linear x.
Steps for Fitting a Model
(1) Propose a model in terms of

Response variable Y (specify the scale)
Explanatory variables X1 , X2 , . . . Xp (include different
functions of explanatory variables if appropriate)
Assumptions about the distribution of over the cases
(2) Specify/define a criterion for judging different estimators.
(3) Characterize the best estimator and apply it to the given data.
(4) Check the assumptions in (1).
(5) If necessary modify model and/or assumptions and go to (1).

Specifying Assumptions in (1) for Residual Distribution

Gauss-Markov: zero mean, constant variance, uncorrelated
Normal-linear models: i are i.i.d. N(0, σ 2 ) r.v.s
Generalized Gauss-Markov: zero mean, and general covariance
matrix (possibly correlated,possibly heteroscedastic)
Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto,
Contaminated normal: some fraction (1 − δ) of the i are i.i.d.
N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some
contamination distribution).

Specifying Estimator Criterion in (2)

Least Squares
Maximum Likelihood
Robust (Contamination-resistant)
Bayes (assume βj are r.v.’s with known prior distribution)
Accommodating incomplete/missing data
Case Analyses for (4) Checking Assumptions
Residual analysis
Model errors i are unobservable
Model residuals for fitted regression parameters β̃j are:
ei = yi − [β̃1 xi,1 + β̃2 xi,2 + · · · + β̃p xi,p ]
Influence diagnostics (identify cases which are highly
‘influential’ ?)
Outlier detection
Outline

Ordinary Least Squares Estimates
Least Squares Criterion: For β = (β1 , β2 , . . . , βp )T , define

PN 2
Q(β) =
PN i=1 [yi − ŷi ]
2
= i=1 [yi − (β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p )]
Ordinary Least-Squares (OLS) estimate β̂: minimizes Q(β).
Matrix Notation
   
y1 x1,1 x1,2 · · · x1,p  
 y2   x2,1 x2,2 · · · β1
x2,p 
.. 
y= . X= . β=
    
.. .. . . 
 ..   .. . . .. 
βp
yn xn,1 xn,2 · · · xp,n

Solving for OLS Estimate β̂

 
ŷ1
 ŷ2 
ŷ =   = Xβ and
 
..
 . 
ŷn
Pn
Q(β) = i=1 (yi − ŷi )2 = (y − ŷ)T (y − ŷ)
= (y − Xβ)T (y − Xβ)
∂Q(β)
OLS β̂ solves ∂βj =0, j = 1, 2, . . . , p
∂Q(β) ∂ Pn 2

∂βj = ∂βj i=1 [yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
Pn
= i=1 2(−xi,j )[yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )]
= −2(X[j] )T (y − Xβ) where X[j] is the jth column of X

Solving for OLS Estimate β̂

 ∂Q
  
∂β1 XT
[1] (y − Xβ)
∂Q
∂Q XT
[2] (y − Xβ)
   
 ∂β2  
 = −2   = −2XT (y − Xβ)

=  .. ..
∂β 
 .



 .


∂Q
∂βp XT
[p] (y − Xβ)
So the OLS Estimate β̂ solves the “Normal Equations”
XT (y − Xβ) = 0
⇐⇒ XT Xβ̂ = XT y
=⇒ β̂ = (XT X)−1 XT y
N.B. For β̂ to exist (uniquely)

(XT X) must be invertible
⇐⇒ X must have Full Column Rank
(Ordinary) Least Squares Fit

OLS Estimate: 
β̂1
βˆ2 T T
β̂ =   = (X X)−1 X y Fitted Values:
 
..
.
β̂p
   
ŷ1 x1,1 β̂1 + · · · + x1,p β̂p
 ŷ2   x2,1 β̂1 + · · · + x2,p β̂p 
yˆ = 
 .. =
 
..


 .   . 
ŷn xn,1 βˆ1 + · · · + xn,p βˆp
= Xβ̂ = X(XT X)−1 XT y = Hy
Where H = X(XT X)−1 XT is the n × n “Hat Matrix”
(Ordinary) Least Squares Fit

The Hat Matrix H projects R n onto the column-space of X
Residuals: ˆi = yi − ŷi , i = 1, 2, . . . , n

 
ˆ1
ˆ2
ˆ =   = y − ŷ = (In − H)y
 
..
.
ˆn
 
0
Normal Equations: XT (y − Xβ̂) = XT ˆ = 0p =  ... 
 
0
N.B. The Least-Squares Residuals vector ˆ is orthogonal to the
column space of X
Outline

Gauss-Markov Theorem: Assumptions

   
y1 x1,1 x1,2 · · · x1,p
 y2   x2,1 x2,2 · · · x2,p 
Data y =   and X =  ..
   
.. .. . . .. 
 .   . . . . 
yn xn,1 xn,2 · · · xp,n
follow a linear model satisfying the Gauss-Markov Assumptions
if y is an observation of random vector Y = (Y1 , Y2 , . . . YN )T and
E (Y | X, β) = Xβ, where β = (β1 , β2 , . . . βp )T is the
p-vector of regression parameters.
Cov (Y | X, β) = σ 2 In , for some σ 2 > 0.
I.e., the random variables generating the observations are
uncorrelated and have constant variance σ 2 (conditional on X,
and β).
For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of
estimating
θ = c1 β1 + c2 β2 + · · · cp βp + cp+1 .
Under the Gauss-Markov assumptions, the estimator
θ̂ = c1 βˆ1 + c2 β̂2 + · · · cp βˆp + cp+1 ,
where β̂1 , β̂2 , . . . βˆp are the least squares estimates is
1) An Unbiased Estimator of θ
2) A Linear Estimator of θ, that is
θ̂ = ni=1 bi yi , for some known (given X) constants bi .
P
Theorem: Under the Gauss-Markov Assumptions, the estimator

θ̂ has the smallest (Best) variance among all Linear Unbiased
Estimators of θ, i.e., θ̂ is BLUE .
Gauss-Markov Theorem: Proof

Proof: Without loss of generality, assume cp+1 = 0 and
define c =(c1 , c2 , . . . , cp )T .
The Least Squares Estimate of θ = cT β is:
θ̂ = cT β̂ = cT (XT X)−1 XT y ≡ dT y
a linear estimate in y given by coefficients d = (d1 , d2 , . . . , dn )T .
Consider an alternative linear estimate of θ:
θ̃ = bT y
with fixed coefficients given by b = (b1 , . . . , bn )T .
Define f = b − d and note that
θ̃ = bT y = (d + f)T y = θ̂ + f T y
If θ̃ is unbiased then because θ̂ is unbiased
0 = E (f T y) = dT E (y) = f T (Xβ) for all β ∈ R p
=⇒ f is orthogonal to column space of X
=⇒ f is orthogonal to d = X(XT X)−1 c
If θ̃ is unbiased then
The orthogonality of f to d implies
Var (θ̃) = Var (bT y) = Var (dT y + f T y)

= Var (dT y) + Var (f T y) + 2Cov (dT y, f T y)
= Var (θ̂) + Var (f T y) + 2dT Cov (y)f
= Var (θ̂) + Var (f T y) + 2dT (σ 2 In )f
= Var (θ̂) + Var (f T y) + 2σ 2 dT f
= Var (θ̂) + Var (f T y) + 2σ 2 × 0
≥ Var (θ̂)

Outline

Generalized Least Squares (GLS) Estimates

Consider generalizing the Gauss-Markov assumptions for the linear
regression model to
Y = Xβ +
where the random n-vector : E [] = 0n and E [0 ] = σ 2 Σ.
σ 2 is an unknown scale parameter
Σ is a known (n × n) positive definite matrix specifying the
relative variances and correlations of the component
observations.
1 1
Transform the data (Y, X) to Y∗ = Σ− 2 Y and X∗ = Σ− 2 X and
the model becomes
Y∗ = X∗ β + ∗ , where E [∗ ] = 0n and E [∗ (∗ )0 ] = σ 2 In
By the Gauss-Markov Theorem, the BLUE (‘GLS’) of β is
β̂ = [(X∗ )T (X∗ )]−1 (X∗ )T (Y∗ ) = [XT Σ−1 X]−1 (XT Σ−1 Y)
Outline

Normal Linear Regression Models

Distribution Theory
Yi = xi,1 β1 + xi,2 β2 + · · · xi,p βp + i
= µ i + i
Assume {1 , 2 , . . . , n } are i.i.d N(0, σ 2 ).
=⇒ [Yi | xi,1 , xi,2 , . . . , xi,p , β, σ 2 ] ∼ N(µi , σ 2 ),
independent over i = 1, 2, . . . n.
Conditioning on X, β, and σ 2  
1
 2 
Y = Xβ + , where =   ∼ Nn (On , σ 2 In )
 
..
 . 
n

Distribution Theory
 
µ1
µ =  ...  = E (Y | X, β, σ 2 ) = Xβ
 
µn

σ2
 
0 0 ··· 0

 0 σ2 0 ··· 0 

Σ = Cov (Y | X, β, σ 2 ) = 
 0 0 σ2 0  = σ 2 In


 ... ... ..
. ... 

0 0 ··· σ2
That is, Σi,j = Cov (Yi , Yj | X, β, σ 2 ) = σ 2 × δi,j .
Apply Moment-Generating Functions (MGFs) to derive

Joint distribution of Y = (Y1 , Y2 , . . . , Yn )T
Joint distribution of β̂ = (β̂1 , β̂2 , . . . , β̂p )T .

MGF of Y
For the n-variate r.v. Y, and constant n−vector t = (t1 , . . . , tn )T ,
T
MY (t) = E (e t Y ) = E (e t1 Y1 +t2 Y2 +···tn Yn )
= E (e t1 Y1 ) · E (e t2 Y2 ) · · · E (e tn Yn )
= MY1 (t1 ) · MY2 (t2 ) · · · MYn (tn )
Qn ti µi + 21 ti2 σ 2
= i =1 e
P n 1 Pn T 1 T
= e i=1 ti µi + 2 i,k=1 ti Σi,k tk = e t u+ 2 t Σt
=⇒ Y ∼ Nn (µ, Σ)
Multivariate Normal with mean µ and covariance Σ

MGF of β̂
For the p-variate r.v. β̂, and constant p−vector τ = (τ1 , . . . , τp )T ,
T β̂ ˆ ˆ
Mβ̂ (τ ) = E (e τ ) = E (e τ1 β1 +τ2 β2 +···τp βp )
Defining A = (XT X)−1 XT we can express

β̂ = (XT X)−1 XT y = AY
and
T ˆ
Mβˆ (τ ) = E (e τ β )
T
= E (e τ AY )
T
= E (e t Y ), with t = AT τ
= MY (t)
T 1 T
= e t u+ 2 t Σt
MGF of β̂
For
T β̂
Mβ̂ (τ ) = E (e τ )
tT u+ 12 tT Σt
= e
Plug in:
t = AT τ = X(XT X)−1 τ
µ = Xβ
Σ = σ 2 In
Gives:
tT µ = τ T β
tT Σt = τ T (XT X)−1 XT [σ 2 In ]X(XT X)−1 τ
= τ T [σ 2 (XT X)−1 ]τ
So the MGF of β̂ is
T 1 T 2 T −1
Mβ̂ (τ ) = e τ β+ 2 τ [σ (X X) ]τ
ˆ 2
MIT 18.S096 T Regression
−1 Analysis 27
Marginal Distributions of Least Squares Estimates
Because
β̂ ∼ Np (β, σ 2 (XT X)−1 )
the marginal distribution of each β̂j is:
β̂j ∼ N(βj , σ 2 Cj,j )
where Cj.j = jth diagonal element of (XT X)−1

The Q-R Decomposition of X
Consider expressing the (n × p) matrix X of explanatory variables

as
X=Q·R
where
Q is an (n × p) orthonormal matrix, i.e., QT Q = Ip .
R is a (p × p) upper-triangular matrix.
The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by

performing the Gram-Schmidt Orthonormalization procedure on
the columns of X = [X[1] , X[2] , . . . , X[p] ]

 
r1,1 r1,2 · · · r1,p−1 r1,p
 0
 r2,2 · · · r2,p−1 r2,p 

If R= 0

0
..
. ... ... , then

 
 0 0 rp−1,p−1 rp−1,p 
0 0 ··· 0 rp,p
X[1] = Q[1] r1,1
=⇒
2
r1,1 = XT
[1] X[1]
Q[1] = X[1] /r1,1
X[2] = Q[1] r1,2 + Q[2] r2,2

=⇒
T T T
Q[1] X[2] = Q[1] Q[1] r1,2 + Q[1] Q[2] r2,2
= 1 · r1,2 + 0 · r2,2
= r1,2 (known since Q[1] specfied)
With r1,2 and Q[1] specfied we can solve for r2,2 :

=⇒
Q[2] r2,2 = X[2] − Q[1] r1,2
Take squared norm of both sides:

2 = XT X T 2
r2,2 [2] [2] − 2r1,2 Q[1] X[2] + r1,2
(all terms on RHS are known)
With r2,2 specified

=⇒
1

Q[2] = r2,2 X[2] − r1,2 Q[1]
Etc. (solve for elements of R, and columns of Q)

With the Q-R Decomposition

X = QR
(QT Q = Ip , and R is p × p upper-triangular)
β̂ = (XT X)−1 XT y = R−1 QT y

(plug in X = QR and simplify)
Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T
H = X(XT X)−1 XT = QQT

(giving ŷ = Hy and ˆ = (In − H)y)

More Distribution Theory

Assume y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,
∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Theorem* For any (m × n) matrix A of rank m ≤ n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z ∼ Nm (µz , Σz )
where µz = AE (y) = AXβ,
and Σz = ACov (y)AT = σ 2 AAT .
Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay
With a different definition of A (and z) we give an easy proof of:
Theorem For the normal linear regression model

y = Xβ + ,
where
X (n × p) has rank p and
∼ Nn (0n , σ 2 In ).
(a) β̂ = (XT X)−1 XT y and ˆ = y − Xβ̂ are independent r.v.s
(b) β̂ ∼ N (β, σ 2 (XT X)−1 )
Pn p2
(c) ˆi = ˆT ˆ ∼ σ 2 χ2n−p (Chi-squared r.v.)
i=1
(d) For each j = 1, 2, . . . , p
β̂j −βj
ˆtj =
σ̂Cj,j ∼ tn−p (t− distribution)
1
Pn
where σ̂ 2 = n−p i=1 ˆ2i
Cj,j = [(XT X)−1 ]j,j
Proof: Note that (d) follows immediately from (a), (b), (c)
QT

Define A = , where
WT
A is an (n × n) orthogonal matrix (i.e. AT = A−1 )
Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X∗ = [ X In ].
Then, consider T
Q y zQ (p × 1)
z = Ay = T =
W y z W (n − p) × 1
The distribution of z = Ay is Nn (µz , Σz )

where T
Q
µz = [A][Xβ] = [Q · R · β]
WT
T
Q Q
= T [R · β]
W Q
Ip
= [R · β]
0
(n−p)×p
R·β
=
0(n−p)×p
Σz = A · [σ 2 In ] · AT = σ 2 [AAT ] = σ 2 In
since AT = A−1


zQ Rβ 2
Thus z = ∼ Nn , σ In
zW On−p
=⇒
zQ ∼ Np [(Rβ), σ 2 Ip ]
zW ∼ N(n−p) [(O(n−p) , σ 2 I(n−p) ]
and zQ and zW are independent.
The Theorem follows by showing
(a*) β̂ = R−1 zQ and ˆ = WzW ,
(i.e. β̂ and ˆ are functions of different independent vecctors).
(b*) Deducing the distribution of β̂ = R−1 zQ ,
applying Theorem* with A = R−1 and “y” = zQ
(c*) ˆT ˆ = zW T zW
= sum of (n − p) squared r.v’s which are i.i.d. N(0, σ 2 ).
∼ σ 2 χ(n−p)
2 , a scaled Chi-Squared r.v.

Proof of (a*)
β̂ = R−1 zQ follows from
β̂ = (XT X)−1 Xy and
X = QR with Q : QT Q = Ip
ˆ = y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ )

= y − QzQ
= y − QQT y = (In − QQT )y
= WWT y (since In = AT A = QQT + WWT )
= WzW

Outline

Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e.,
∼ Nn (0n , σ 2 In )
or y ∼ Nn (Xβ, σ 2 In )
Definitions:
The likelihood function is
L(β, σ 2 ) = p(y | X, B, σ 2 )
where p(y | X, B, σ 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (β, σ 2 ) (unknown).
The maximum likelihood estimates of (β, σ 2 ) are the values
maximizing L(β, σ 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
2
Ppthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where
Because
µi = j=1 βj xi,j ,
Qn
L(β, σ 2 ) = p (yi | β, σ 2 )
Qni=1 h 1 − 1 2 (yi − j=1 βj xi,j )2
P i
= i=1
√
2
e 2σ
2πσ
1 − 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ)
= (2πσ 2 )n/2
e
The maximum likelihood estimates (β̂, σ̂ 2 ) maximize the
log-likeliood function (dropping constant terms)
logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ)
= − n2 log (σ 2 ) − 2σ1 2 Q (β)
where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!)
The OLS estimate β̂ is also the ML-estimate.
The ML estimate 2
of σ 2 solves
∂log L(β̂ ,σ )
∂(σ 2 )
= 0 ,i.e., − n2 σ12 − 21 (−1)(σ 2 )−2 Q(β̂) = 0
=⇒ σML ˆ
2 = Q(β̂)/n = ( ni=1 ˆ2i )/n (biased!)
P

Outline

For data y, X fit the linear regression model
y i = xT
i β + i , i = 1, 2, . . . , n.
by specifying β =P β̂ to minimize
Q(β) = ni=1 h(yi , xi , β, σ 2 )
The choice of the function h( ) distinguishes different estimators.
(1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xT
i β)
2
(2) Mean Absolue Deviation (MAD): h(yi , xi , β, σ 2 ) = |yi − xiT β|

(3) Maximum Likelihood (ML): Assume the yi are independent
with pdf’s p(yi | β, xi , σ 2 ),
h(yi , xi , β, σ 2 ) = −log p(yi | β, xi , σ 2 )
(4) Robust M−Estimator: h(yi , xi , β, σ 2 ) = χ(yi − xT i β)
χ( ) is even, monotone increasing on (0, ∞).
(5) Quantile Estimator: Forτ : 0 < τ < 1, a fixed quantile

τ |yi − xT
i β|, if yi ≥ xi β
h(yi , xi , β, σ 2 ) =
(1 − τ )|yi − xT
i β|, if yi < xi β
E.g., τ = 0.90 corresponds to the 90th quantile /

upper-decile.
τ = 0.50 corresponds to the MAD Estimator

MIT OpenCourseWare
http://ocw.mit.edu
18.S096 Topics in Mathematics with Applications in Finance

Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Free Mit

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Free Mit

Transféré par

Droits d'auteur :

Formats disponibles

Regression Analysis

Lecture 6: Regression Analysis

MIT 18.S096 Regression Analysis 1

MIT 18.S096 Regression Analysis 2

Multiple Linear Regression: Setup

General Linear Model: For each case i, the conditional

Steps for Fitting a Model

(1) Propose a model in terms of

MIT 18.S096 Regression Analysis 5

Specifying Assumptions in (1) for Residual Distribution

MIT 18.S096 Regression Analysis 6

Specifying Estimator Criterion in (2)

MIT 18.S096 Regression Analysis 8

Ordinary Least Squares Estimates

Least Squares Criterion: For β = (β1 , β2 , . . . , βp )T , define

Ordinary Least-Squares (OLS) estimate β̂: minimizes Q(β).

MIT 18.S096 Regression Analysis 9

Solving for OLS Estimate β̂

MIT 18.S096 Regression Analysis 10

Solving for OLS Estimate β̂

N.B. For β̂ to exist (uniquely)

(Ordinary) Least Squares Fit

(Ordinary) Least Squares Fit

Residuals: ˆi = yi − ŷi , i = 1, 2, . . . , n

MIT 18.S096 Regression Analysis 14

Gauss-Markov Theorem: Assumptions

Theorem: Under the Gauss-Markov Assumptions, the estimator

Gauss-Markov Theorem: Proof

Var (θ̃) = Var (bT y) = Var (dT y + f T y)

MIT 18.S096 Regression Analysis 18

MIT 18.S096 Regression Analysis 19

Generalized Least Squares (GLS) Estimates

MIT 18.S096 Regression Analysis 21

Normal Linear Regression Models

MIT 18.S096 Regression Analysis 22

MIT 18.S096 Regression Analysis 23

Apply Moment-Generating Functions (MGFs) to derive

MIT 18.S096 Regression Analysis 24

MIT 18.S096 Regression Analysis 25

Defining A = (XT X)−1 XT we can express

Marginal Distributions of Least Squares Estimates

β̂ ∼ Np (β, σ 2 (XT X)−1 )

the marginal distribution of each β̂j is:

β̂j ∼ N(βj , σ 2 Cj,j )

where Cj.j = jth diagonal element of (XT X)−1

MIT 18.S096 Regression Analysis 28

The Q-R Decomposition of X

Consider expressing the (n × p) matrix X of explanatory variables

The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by

MIT 18.S096 Regression Analysis 29

X[2] = Q[1] r1,2 + Q[2] r2,2

With r1,2 and Q[1] specfied we can solve for r2,2 :

Take squared norm of both sides:

(all terms on RHS are known)

With r2,2 specified

MIT 18.S096 Regression Analysis 31

With the Q-R Decomposition

β̂ = (XT X)−1 XT y = R−1 QT y

Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T

H = X(XT X)−1 XT = QQT

MIT 18.S096 Regression Analysis 32

More Distribution Theory

Theorem For the normal linear regression model

Residuals: ˆi = yi − ŷi , i = 1, 2, . . . , n

ˆ = y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ )

(5) Quantile Estimator: Forτ : 0 < τ < 1, a fixed quantile