MIT18 S096F13 Lecnote6

Regression Analysis
Lecture 6: Regression Analysis

MIT 18.S096
Dr. Kempthorne
Fall 2013
MIT 18.S096
Regression Analysis
Regression Analysis
Linear Regression: Overview

Ordinary Least Squares (OLS)
Gauss-Markov Theorem
Generalized Least Squares (GLS)
Distribution Theory: Normal Regression Models
Maximum Likelihood Estimation
Generalized M Estimation
Outline
Regression Analysis
MIT 18.S096
Regression Analysis
Regression Analysis

Multiple Linear Regression: Setup

Data Set
n cases i = 1, 2, . . . , n
1 Response (dependent) variable
yi , i = 1, 2, . . . , n
p Explanatory (independent) variables
xi = (xi,1 , xi,2 , . . . , xi,p )T , i = 1, 2, . . . , n
Goal of Regression Analysis:
Extract/exploit relationship between yi and xi .
Examples
Prediction
Causal Inference
Approximation
Functional Relationships
MIT 18.S096
Regression Analysis
Regression Analysis

General Linear Model: For each case i, the conditional

distribution [yi | xi ] is given by
yi = yi + i
where
yi = 1 xi,1 + 2 xi,2 + + i,p xi,p
= (1 , 2 , . . . , p )T are p regression parameters
(constant over all cases)
i Residual (error) variable
(varies over all cases)
Extensive breadth of possible models
Polynomial approximation (xi,j = (xi )j , explanatory variables are different
powers of the same variable x = xi )
Fourier Series: (xi,j = sin(jxi ) or cos(jxi ), explanatory variables are different
sin/cos terms of a Fourier series expansion)
Time series regressions: time indexed by i, and explanatory variables include
lagged response values.
Note: Linearity of yi (in regression parameters) maintained with non-linear x.
MIT 18.S096
Regression Analysis
Regression Analysis

Steps for Fitting a Model

(1) Propose a model in terms of
Response variable Y (specify the scale)
Explanatory variables X1 , X2 , . . . Xp (include different
functions of explanatory variables if appropriate)
Assumptions about the distribution of over the cases
(2) Specify/define a criterion for judging different estimators.
(3) Characterize the best estimator and apply it to the given data.
(4) Check the assumptions in (1).
(5) If necessary modify model and/or assumptions and go to (1).
MIT 18.S096
Regression Analysis
Regression Analysis

Specifying Assumptions in (1) for Residual Distribution

Gauss-Markov: zero mean, constant variance, uncorrelated
Normal-linear models: i are i.i.d. N(0, 2 ) r.v.s
Generalized Gauss-Markov: zero mean, and general covariance
matrix (possibly correlated,possibly heteroscedastic)
Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto,
Contaminated normal: some fraction (1 ) of the i are i.i.d.
N(0, 2 ) r.v.s the remaining fraction () follows some
contamination distribution).
MIT 18.S096
Regression Analysis
Regression Analysis

Specifying Estimator Criterion in (2)

Least Squares
Maximum Likelihood
Robust (Contamination-resistant)
Bayes (assume j are r.v.s with known prior distribution)
Accommodating incomplete/missing data
Case Analyses for (4) Checking Assumptions
Residual analysis
Model errors i are unobservable
Model residuals for fitted regression parameters j are:
ei = yi [1 xi,1 + 2 xi,2 + + p xi,p ]
Influence diagnostics (identify cases which are highly

influential ?)
Outlier detection
MIT 18.S096
Regression Analysis
Regression Analysis

Outline
Regression Analysis
MIT 18.S096
Regression Analysis
Regression Analysis

Ordinary Least Squares Estimates

Least Squares Criterion: For = (1 , 2 , . . . , p )T , define
PN
Q() =
i ]2
i=1 [yi y
PN
2
=
i=1 [yi (1 xi,1 + 2 xi,2 + + i,p xi,p )]
: minimizes Q().
Ordinary Least-Squares (OLS) estimate
Matrix Notation
y1
x1,1 x1,2
y2
x2,1 x2,2
y= . X= .
..
..
..
..
.
.
yn
xn,1 xn,2
MIT 18.S096
x1,p
x2,p
.
..
xp,n
Regression Analysis
1
..
.
p
9
Regression Analysis

Solving for OLS Estimate
y=
Q() =
y1
y2
..
.
= X and
yn
Pn
i=1 (yi
yi )2 = (y
y)T (y
y)
= (y X)T (y X)
Q()
OLS solves =0, j = 1, 2, . . . , p
j

Pn
Q()
2
[y
(x
+
x
)]
=
1
2
p
i
i,1
i,2
i,p
i=1
j
Pjn
=
i=1 2(xi,j )[yi (xi,1 1 + xi,2 2 + xi,p p )]
= 2(X[j] )T (y X) where X[j] is the jth column of X
MIT 18.S096
Regression Analysis
10
Regression Analysis

Solving for OLS Estimate
Q
1
Q
2
..
.
Q
p
= 2
solves
So the OLS Estimate
T
X (y X)
XT X
XT
[1] (y X)
XT
[2] (y X)
..
.
XT
[p] (y X)
the
=
=
=
= 2XT (y X)
Normal Equations
0
XT y
(XT X)1 XT y
to exist (uniquely)
N.B. For
(XT X) must be invertible
X must have Full Column Rank

MIT 18.S096
Regression Analysis
11
Regression Analysis

(Ordinary) Least Squares Fit

OLS Estimate:
y =
y1
y2
..
.
yn
1
2
.. =
.
p

(XT X)1 XT y Fitted Values:
x1,1 1 + + x1,p p
x2,1 1 + + x2,p p
..
.
xn,1 1 + + xn,p p
= X(XT X)1 XT y = Hy
= X
Where
H = X(XT X)1 XT is the n n Hat Matrix

MIT 18.S096
Regression Analysis
12
Regression Analysis

(Ordinary) Least Squares Fit

The Hat Matrix H projects R n onto the column-space of X
Residuals: i = yi yi , i = 1, 2, . . . , n
=
1
2
..
.
= y y = (In H)y
n
0
.
) = XT = 0p =
Normal Equations: XT (y X
..
0
N.B. The Least-Squares Residuals vector is orthogonal to the
column space of X
MIT 18.S096
Regression Analysis
13
Regression Analysis

Outline
Regression Analysis
MIT 18.S096
Regression Analysis
14
Regression Analysis

Gauss-Markov Theorem: Assumptions
x1,1 x1,2 x1,p
x2,1 x2,2 x2,p
Data y =
and X = ..
..
..
.
.
.
.
.
.
yn
xn,1 xn,2 xp,n
follow a linear model satisfying the Gauss-Markov Assumptions
if y is an observation of random vector Y = (Y1 , Y2 , . . . YN )T and
y1
y2
..
.
E (Y | X, ) = X, where = (1 , 2 , . . . p )T is the
p-vector of regression parameters.
Cov (Y | X, ) = 2 In , for some 2 > 0.

I.e., the random variables generating the observations are
uncorrelated and have constant variance 2 (conditional on X,
and ).
MIT 18.S096
Regression Analysis
15
Regression Analysis

For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of
estimating
= c1 1 + c2 2 + cp p + cp+1 .
Under the Gauss-Markov assumptions, the estimator
= c1 1 + c2 2 + cp p + cp+1 ,
where 1 , 2 , . . . p are the least squares estimates is

1) An Unbiased Estimator of
2) A Linear Estimator
of , that is
P
= ni=1 bi yi , for some known (given X) constants bi .
Theorem: Under the Gauss-Markov Assumptions, the estimator
has the smallest (Best) variance among all Linear Unbiased
Estimators of , i.e., is BLUE .
MIT 18.S096
Regression Analysis
16
Regression Analysis

Gauss-Markov Theorem: Proof

Proof: Without loss of generality, assume cp+1 = 0 and
define c =(c1 , c2 , . . . , cp )T .
The Least Squares Estimate of = cT is:
= cT (XT X)1 XT y dT y
= cT
a linear estimate in y given by coefficients d = (d1 , d2 , . . . , dn )T .
Consider an alternative linear estimate of :
= bT y
with fixed coefficients given by b = (b1 , . . . , bn )T .
Define f = b d and note that
= bT y = (d + f)T y = + f T y
If is unbiased then because is unbiased
0 = E (f T y) = dT E (y) = f T (X) for all R p
= f is orthogonal to column space of X
= f is orthogonal to d = X(XT X)1 c
MIT 18.S096
Regression Analysis
17
Regression Analysis

If is unbiased then
The orthogonality of f to d implies
Var () =
=
=
=
=
=
Var (bT y) = Var (dT y + f T y)

Var (dT y) + Var (f T y) + 2Cov (dT y, f T y)
Var () + Var (f T y) + 2dT Cov (y)f
Var () + Var (f T y) + 2dT ( 2 In )f
Var () + Var (f T y) + 2 2 dT f
Var () + Var (f T y) + 2 2 0
Var ()
MIT 18.S096
Regression Analysis
18
Regression Analysis

Outline
Regression Analysis
MIT 18.S096
Regression Analysis
19
Regression Analysis

Generalized Least Squares (GLS) Estimates

Consider generalizing the Gauss-Markov assumptions for the linear
regression model to
Y = X +
where the random n-vector : E [] = 0n and E [0 ] = 2 .
2 is an unknown scale parameter
is a known (n n) positive definite matrix specifying the
relative variances and correlations of the component
observations.
1
1
Transform the data (Y, X) to Y = 2 Y and X = 2 X and
the model becomes
Y = X + , where E [ ] = 0n and E [ ( )0 ] = 2 In
By the Gauss-Markov Theorem, the BLUE (GLS) of is
= [(X )T (X )]1 (X )T (Y ) = [XT 1 X]1 (XT 1 Y)
MIT 18.S096
Regression Analysis
20
Regression Analysis

Outline
Regression Analysis
MIT 18.S096
Regression Analysis
21
Regression Analysis

Normal Linear Regression Models

Distribution Theory
Yi = xi,1 1 + xi,2 2 + xi,p p + i
= i + i
Assume {1 , 2 , . . . , n } are i.i.d N(0, 2 ).
= [Yi | xi,1 , xi,2 , . . . , xi,p , , 2 ] N(i , 2 ),
independent over i = 1, 2, . . . n.
Conditioning on X, , and 2
Y = X + , where =
1
2
..
.
Nn (On , 2 In )
n
MIT 18.S096
Regression Analysis
22
Regression Analysis

Distribution Theory
= ... = E (Y | X, , 2 ) = X
n
MIT 18.S096
Regression Analysis
23

Regression Analysis
= Cov (Y | X, , 2 ) =
2
0
0
...
0 0
2 0
0 2
...
..
0
0
0
...
2
= 2 In
That is, i,j = Cov (Yi , Yj | X, , 2 ) = 2 i,j .

Apply Moment-Generating Functions (MGFs) to derive
Joint distribution of Y = (Y1 , Y2 , . . . , Yn )T
= (1 , 2 , . . . , p )T .
Joint distribution of
MIT 18.S096
Regression Analysis
24
Regression Analysis

MGF of Y
For the n-variate r.v. Y, and constant nvector t = (t1 , . . . , tn )T ,
MY (t) =
=
=
=
=
E (e t Y ) = E (e t1 Y1 +t2 Y2 +tn Yn )
E (e t1 Y1 ) E (e t2 Y2 ) E (e tn Yn )
MY1 (t1 ) MY2 (t2 ) MYn (tn )
Qn
ti i + 21 ti2 2
i =1 e
P
n
1 Pn
1 T
T
e i=1 ti i + 2 i,k=1 ti i,k tk = e t u+ 2 t t
= Y Nn (, )
Multivariate Normal with mean and covariance
MIT 18.S096
Regression Analysis
25

Regression Analysis
MGF of
, and constant pvector = (1 , . . . , p )T ,
For the p-variate r.v.
M ( ) = E (e
) = E (e 1 1 +2 2 +p p )
Defining A = (XT X)1 XT we can express

= (XT X)1 XT y = AY
and
M ( ) =
=
=
=
=
E (e )
T
E (e AY )
T
E (e t Y ), with t = AT
MY (t)
1 T
T
e t u+ 2 t t
MIT 18.S096
Regression Analysis
26
Regression Analysis

MGF of
For
M ( ) = E (e
= e
tT u+ 12 tT t
Plug in:
t = AT = X(XT X)1
= X
= 2 In
Gives:
tT = T
tT t = T (XT X)1 XT [ 2 In ]X(XT X)1
= T [ 2 (XT X)1 ]
is
So the MGF of
1 T 2
T
T
1
M ( ) = e + 2 [ (X X) ]
T Regression
2
1 Analysis
MIT 18.S096
27
Regression Analysis

Marginal Distributions of Least Squares Estimates

Because
Np (, 2 (XT X)1 )
the marginal distribution of each j is:

j N(j , 2 Cj,j )
where Cj.j = jth diagonal element of (XT X)1
MIT 18.S096
Regression Analysis
28
Regression Analysis

The Q-R Decomposition of X

Consider expressing the (n p) matrix X of explanatory variables
as
X=QR
where
Q is an (n p) orthonormal matrix, i.e., QT Q = Ip .
R is a (p p) upper-triangular matrix.
The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by
performing the Gram-Schmidt Orthonormalization procedure on
the columns of X = [X[1] , X[2] , . . . , X[p] ]
MIT 18.S096
Regression Analysis
29
Regression Analysis
If
X[1]
=
r1,1 r1,2
0
r2,2
..
R= 0
.
0
0
0
0
0
= Q[1] r1,1

r1,p1
r2,p1
...
r1,p
r2,p
...
rp1,p1 rp1,p
0
rp,p
, then
2
r1,1
= XT
[1] X[1]
Q[1] = X[1] /r1,1
X[2] = Q[1] r1,2 + Q[2] r2,2

=
T
T
T
Q[1]
X[2] = Q[1]
Q[1] r1,2 + Q[1]
Q[2] r2,2
= 1 r1,2 + 0 r2,2
= r1,2 (known since Q[1] specfied)
MIT 18.S096
Regression Analysis
30
Regression Analysis

With r1,2 and Q[1] specfied we can solve for r2,2 :

=
Q[2] r2,2 = X[2] Q[1] r1,2
Take squared norm of both sides:
T
2 = XT X
2
r2,2
[2] [2] 2r1,2 Q[1] X[2] + r1,2
(all terms on RHS are known)
With r2,2 specified
=

1
Q[2] = r2,2
X[2] r1,2 Q[1]
Etc. (solve for elements of R, and columns of Q)
MIT 18.S096
Regression Analysis
31
Regression Analysis

With the Q-R Decomposition

X = QR
(QT Q = Ip , and R is p p upper-triangular)
= (XT X)1 XT y = R1 QT y
(plug in X = QR and simplify)

= 2 (XT X)1 = 2 R1 (R1 )T
Cov ()
H = X(XT X)1 XT = QQT
(giving y = Hy and = (In H)y)
MIT 18.S096
Regression Analysis
32
Regression Analysis

More Distribution Theory

Assume y = X + , where {i } are i.i.d. N(0, 2 ), i.e.,
Nn (0n , 2 In )
or y Nn (X, 2 In )
Theorem* For any (m n) matrix A of rank m n, the random
normal vector y transformed by A,
z = Ay
is also a random normal vector:
z Nm (z , z )
where
z = AE (y) = AX,
and
z = ACov (y)AT = 2 AAT .
= Ay
Earlier, A = (XT X)1 XT yields the distribution of
With a different definition of A (and z) we give an easy proof of:
MIT 18.S096
Regression Analysis
33
Regression Analysis

Theorem For the normal linear regression model

y = X + ,
where
X (n p) has rank p and
Nn (0n , 2 In ).
= (XT X)1 XT y and = y X
are independent r.v.s
(a)
N (, 2 (XT X)1 )
(b)
Pn p2
(c)
i = T 2 2np (Chi-squared r.v.)
i=1
(d) For each j = 1, 2, . . . , p
tj =
where
j j
Cj,j tnp (t distribution)

1
Pn
2 = np
2i
i=1
Cj,j = [(XT X)1 ]j,j
MIT 18.S096
Regression Analysis
34
Regression Analysis

Proof: Note that (d) follows immediately from (a), (b), (c)
QT
Define A =
WT

, where
A is an (n n) orthogonal matrix (i.e. AT = A1 )

Q is the column-orthonormal matrix in a Q-R decomposition
of X
Note: W can be constructed by continuing the Gram-Schmidt
Orthonormalization process (which was used to construct Q from
X) with X = [ X In ].
Then, consider
T

Q y
zQ
(p 1)
z = Ay =
=
T
(n p) 1
z
W y
W
MIT 18.S096
Regression Analysis
35
Regression Analysis

The distribution of z = Ay is Nn (z , z )
where
T
Q
z = [A][X] =
[Q R ]
WT
T

Q Q
=
[R ]
T
W Q
Ip
=
[R ]
0
(np)p
R
=
0(np)p
z = A [ 2 In ] AT = 2 [AAT ] = 2 In
since AT = A1
MIT 18.S096
Regression Analysis
36
Regression Analysis

Thus z =
zQ
zW

Nn

R
Onp
, In
zQ Np [(R), 2 Ip ]
zW N(np) [(O(np) , 2 I(np) ]
and
zQ and zW are independent.
The Theorem follows by showing
= R1 zQ and = WzW ,
(a*)
and are functions of different independent vecctors).
(i.e.
= R1 zQ ,
(b*) Deducing the distribution of
1
applying Theorem* with A = R and y = zQ
(c*) T = zW T zW
= sum of (n p) squared r.vs which are i.i.d. N(0, 2 ).
2
2 (np)
, a scaled Chi-Squared r.v.
MIT 18.S096
Regression Analysis
37
Regression Analysis

Proof of (a*)
= R1 zQ follows from
= (XT X)1 Xy and
X = QR with Q : QT Q = Ip
=
=
=
=
=
= y (QR) (R1 zQ )
y y = y X
y QzQ
y QQT y = (In QQT )y
WWT y (since In = AT A = QQT + WWT )
WzW
MIT 18.S096
Regression Analysis
38
Regression Analysis

Outline
Regression Analysis
MIT 18.S096
Regression Analysis
39
Regression Analysis

Maximum-Likelihood Estimation
Consider the normal linear regression model:
y = X + , where {i } are i.i.d. N(0, 2 ), i.e.,
Nn (0n , 2 In )
or y Nn (X, 2 In )
Definitions:
The likelihood function is
L(, 2 ) = p(y | X, B, 2 )
where p(y | X, B, 2 ) is the joint probability density function
(pdf) of the conditional distribution of y given data X,
(known) and parameters (, 2 ) (unknown).
The maximum likelihood estimates of (, 2 ) are the values
maximizing L(, 2 ), i.e., those which make the observed
data y most likely in terms of its pdf.
MIT 18.S096
Regression Analysis
40
Regression Analysis

2
Because
Ppthe yi are independent r.v.s with yi N(i , ) where
i = j=1 j xi,j ,
Qn
L(, 2 ) =
p (yi | , 2 )
i
P
Qni=1 h 1
1 2 (yi j=1 j xi,j )2
2
=
e
i=1
2
2
12 (yX)T ( 2 In )1 (yX)
1
e
(2 2 )n/2
,
likelihood estimates (
2 ) maximize the
The maximum
log-likeliood function (dropping constant terms)
logL(, 2 ) = n2 log ( 2 ) 12 (y X)T ( 2 In )1 (y X)
= n2 log ( 2 ) 21 2 Q ()
where Q() = (y X)T (y X) ( Least-Squares Criterion!)
is also the ML-estimate.
The OLS estimate
The ML estimate
of 2 solves
2
log L( , )
( 2 )
2 =
= ML
) = 0
= 0 ,i.e., n2 12 21 (1)( 2 )2 Q(
P
)/n = ( n 2 )/n (biased!)
Q(
i=1 i
MIT 18.S096
Regression Analysis
41
Regression Analysis

Outline
Regression Analysis
MIT 18.S096
Regression Analysis
42
Regression Analysis

For data y, X fit the linear regression model
y i = xT
i + i , i = 1, 2, . . . , n.
to minimize
by specifying =P
Q() = ni=1 h(yi , xi , , 2 )

The choice of the function h( ) distinguishes different estimators.
2
(1) Least Squares: h(yi , xi , , 2 ) = (yi xT
i )
(2) Mean Absolue Deviation (MAD): h(yi , xi , , 2 ) = |yi xiT |
(3) Maximum Likelihood (ML): Assume the yi are independent
with pdfs p(yi | , xi , 2 ),
h(yi , xi , , 2 ) = log p(yi | , xi , 2 )
(4) Robust MEstimator: h(yi , xi , , 2 ) = (yi xT
i )
( ) is even, monotone increasing on (0, ).
MIT 18.S096
Regression Analysis
43
Regression Analysis

(5) Quantile Estimator: For : 0 < < 1, a fixed quantile

|yi xT
i |, if yi xi
h(yi , xi , , 2 ) =
(1 )|yi xT
i |, if yi < xi
E.g., = 0.90 corresponds to the 90th quantile /
upper-decile.
= 0.50 corresponds to the MAD Estimator
MIT 18.S096
Regression Analysis
44
MIT OpenCourseWare
http://ocw.mit.edu
18.S096 Topics in Mathematics with Applications in Finance

Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

MIT18 S096F13 Lecnote6

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

MIT18 S096F13 Lecnote6

Transféré par

Droits d'auteur :

Formats disponibles

Regression Analysis

Lecture 6: Regression Analysis

Linear Regression: Overview

Linear Regression: Overview

Multiple Linear Regression: Setup

Linear Regression: Overview

General Linear Model: For each case i, the conditional

Linear Regression: Overview

Steps for Fitting a Model

Linear Regression: Overview

Specifying Assumptions in (1) for Residual Distribution

Linear Regression: Overview

Specifying Estimator Criterion in (2)

Influence diagnostics (identify cases which are highly

Linear Regression: Overview

Linear Regression: Overview

Ordinary Least Squares Estimates

Linear Regression: Overview

Solving for OLS Estimate

Linear Regression: Overview

Solving for OLS Estimate

X must have Full Column Rank

Linear Regression: Overview

(Ordinary) Least Squares Fit

(XT X)1 XT y Fitted Values:

H = X(XT X)1 XT is the n n Hat Matrix

Linear Regression: Overview

(Ordinary) Least Squares Fit

Linear Regression: Overview

Linear Regression: Overview

Gauss-Markov Theorem: Assumptions

x1,1 x1,2 x1,p

x2,1 x2,2 x2,p

Cov (Y | X, ) = 2 In , for some 2 > 0.

Linear Regression: Overview

where 1 , 2 , . . . p are the least squares estimates is

Linear Regression: Overview

Gauss-Markov Theorem: Proof

Linear Regression: Overview

Var (bT y) = Var (dT y + f T y)

Linear Regression: Overview

Linear Regression: Overview

Generalized Least Squares (GLS) Estimates

Linear Regression: Overview

Linear Regression: Overview

Normal Linear Regression Models

Linear Regression: Overview

Linear Regression: Overview

That is, i,j = Cov (Yi , Yj | X, , 2 ) = 2 i,j .

Linear Regression: Overview

Linear Regression: Overview

Defining A = (XT X)1 XT we can express

Linear Regression: Overview

Linear Regression: Overview

Marginal Distributions of Least Squares Estimates

the marginal distribution of each j is:

Linear Regression: Overview

The Q-R Decomposition of X

Linear Regression: Overview

X[2] = Q[1] r1,2 + Q[2] r2,2

Linear Regression: Overview

With r1,2 and Q[1] specfied we can solve for r2,2 :

Linear Regression: Overview

(5) Quantile Estimator: For : 0 < < 1, a fixed quantile