Académique Documents
Professionnel Documents
Culture Documents
Chapters 2–3:
Classical Linear Regression
y = x1 β 1 + x2 β 2 + · · · + xK β k +
where y is the dependent, or endogenous vari-
able (sometimes termed the response variable)
and the x variables are the independent, or ex-
ogenous variables, often termed the regressors
or covariates. This of course presumes that the
relationship only involves one endogenous vari-
able, and that is in fact the setting for the clas-
sical linear regression model. If there are mul-
tiple endogenous variables in a relationship (as
there would be in a demand curve, or a macro
consumption function) then we must use more
advanced regression techniques to deal with
that endogeneity, or in economic terms simul-
taneity.
• Strict exogeneity of x:
E[i|xj1, xj2, . . . , xjK ] = 0, ∀i, j. The distri-
bution of does not depend on past, present
nor future x values.
• Spherical disturbances: the covariance
matrix of the vector is σ 2In. The er-
ror terms are identically and independently
distributed (i.i.d.).
y = Aert
may be made stochastic by adding a term e.
When logs are taken, this model becomes a lin-
ear relationship between ln y and t. The coef-
ficient r is the semi–elasticity of y with respect
to t: that is, the growth rate of y.
Although this sleight of hand will allow many
models to be expressed in this linear form,
some models cannot be written in that man-
ner by any means. In that case, alternative
estimation techniques must be employed.
Full rank
Strict exogeneity of x
Spherical disturbances
Stochastic regressors
In developing the regression model, it is com-
mon to assume that the xi are nonstochastic,
as they would be in an experimental setting.
This would simplify the assumptions above on
exogeneity and the distribution of the errors,
since then X would be considered a matrix
of fixed constants. But in social science re-
search, we rarely work with data from the ex-
perimental setting. If we consider that some
of the regressors in our model are stochastic,
then our assumption concerns the nature of
the data generating process than produces xi.
That process must be independent of the data
generating process underlying the errors if the
classical regression model is to be applied.
X 0Xb = X 0y
with solution
b = (X 0X)−1X 0y
For the solution to be a minimum, the sec-
ond derivatives must form a positive definite
matrix: 2X 0X. If the matrix X has full rank,
then this condition will be satisfied: the least
squares solution will be unique, and a minimum
of the sum of squared residuals. If X is rank–
deficient, the solution will not exist, as X 0X
will be singular. In that instance, we cannot
uniquely solve for all K elements of b, since X
contains one or more linear dependencies.
b1 = Ȳ − b2X̄
This is known as “simple” regression, in which
we have only one regressor (beyond ι). In
multiple regression, the slope coefficients will
generally differ from those of simple regres-
sion. The simple regression coefficients are to-
tal derivatives of Y with respect to X, whereas
the multiple regression coefficients are partial
derivatives. So, for instance, a solution for the
aforementioned “three–variable” regression prob-
lem will yield
byg bytbtg
byg·t = 2
− 2
1 − rgt 1 − rgt
where byg·t is the multiple regression coeffi-
2 ) is
cient of y on g, holding t fixed, and (rgt
the squared simple correlation between g and
t. Note that if this correlation is 1 or -1,
the multiple regression cannot be computed
(X fails the full rank assumption). Note that
byg·t will differ from the simple regression co-
efficient byg due to the second term. The sec-
ond term could be zero for two reasons: (1) t
does not influence y, so byt is effectively zero;
or (2) g and t are uncorrelated, so that btg is
zero. In the latter case, the squared correla-
tion coefficient is also zero, and the formula
for the multiple regression coefficient becomes
that of the simple regression coefficient. Con-
versely, as long as there is correlation among
the regressors, the multiple regression coeffi-
cients will differ from simple regression coef-
ficients, as “partialling off” the effects of the
other regressors will matter. Thus, if the re-
gressors are orthogonal, the multiple regres-
sion coefficients will equal the simple regres-
sion coefficients. Generally, we will not find
orthogonal regressors in economic data except
by construction: for instance, binary variables
indicating that a given individual is M or F will
be orthogonal to one another, since their dot
product is zero.
Some results from the least squares solution of
a model with a constant term:
ŷ = (I − M )y = X(X 0X)−1X 0y = P y
where the projection matrix P , which is also
symmetric and idempotent, generates the fit-
ted values as the projection of y on X, which is
also the projection of y into the column space
of X. Since M produces the residuals, and P
produces the fitted values, it follows that M
and P are orthogonal: P M = M P = 0, and by
simple algebra P X = X. We can then write
y = P y + M y, and consider the least squares
problem in that context:
y 0y = y 0P 0P y + y 0M 0M y
or
y 0y = ŷ 0ŷ + e0e
The orthogonality of P and M imply that the
cross–product terms that would result from
that expansion are zero. Given the definition
of ŷ, we can also write
y = Xb + e = ŷ + e
and premultiply by
M 0 = [I − n−1ιι0],
the idempotent matrix which transforms vari-
ables into deviations from their own means:
M 0y = M 0Xb + M 0e
If we now square this equation, we find that