Académique Documents
Professionnel Documents
Culture Documents
Y is known as the dependent variable (also explained, response, predicted and regressand).
X is known as the independent variable (also explanatory, control, predictor and regressor).
u is the error term and it incorporates all those other variables which have an
impact on Y but are not included in the model
0 is the constant term or intercept. It is the value that Y takes when the value of the
other variable=0
Y=X*1.
marginal effect of X on Y.
Example
Y=0+1X+u, where Y=wages; X=education.
This equation shows the relationship between education and wages.
1.
Positive.
2.
Linear relationship - a one unit change of education has the same effect on Y regardless
of the initial value of X.
3.
1 measures the relationship between wages and education- the change in wages
given another year of education holding all other factors constant
4.
5.
6.
Empirical Economics
26163
Handout 1
The conditional expectation of Y (for a given Y)- it is the line that passes through
the conditional means of Y.
Wages
YX
YX
YX
Education
X1
X3
X2
This is called the population regression line.
We can now look at the PRL from another angle (Figure 2.1 Woolridge)
YX
YX
Empirical Economics
26163
Handout 1
(ii) X2 of education.
A distribution of wages at this level of X can be derived. This is a distribution of the
wages of all the individuals with X2 years of schooling.
YX
YX
YX
YX
2.
Everyone within each value of X will not be earning the same level of income.
3.
For each level of X - the mean and the standard deviation- illustrated in a distribution.
Empirical Economics
26163
Handout 1
Given this information all the possible distribution for each level of X everything can be
represented in the Y-X space.
On the horizontal axis-education and wages on the vertical axis
Wages
YX
YX
YX
2
1
Education
X1
X2
X3
For each distribution - the mean Y on the vertical axis- the conditional mean of Y.
The PRL - the locus of all conditional means of the dependent variable (Y-wages)
for the fixed values of the independent variable (X-education).
The distances above and below the mean values are the errors.
What is required is that the average (which is the expected value) of these
deviations corresponding to any given X should be zero. [E(u)=0]..
Another distinction
Y=0+1X+u is referred to as the stochastic population regression function -how the
independent variable varies around their mean values due to the presence of the error
term.
E(YX)=0+1X is the deterministic or nonstochastic PRF since it represents the mean of Y
corresponding to specific values of X.
Empirical Economics
26163
Handout 1
Example
Wage and education
Wage=-0.94+0.54education
Wage is measured in dollars per hour
Education denotes the years of schooling
n=526
The intercept is 0.94 - people with no education has a predicted hourly wage of
94cents (this is unrealistic).
Wages
Regression line
Education
The relationship is linear- another year of education increases the wage by the
same amount regardless of the initial wage.
Y 0 1X u
(1)
Y and X are variables, 0 and 1 are parameters and u is the error term.
Empirical Economics
26163
Handout 1
Assumption 1
E(u) 0
(2)
And
u Y 0 1 X
E(u) E( Y 0 1 X)
(3)**
Since E(u) 0
E( Y 0 1 X) 0
Assumption 2
Given the assumption that u and X are u=independent (assumption 2) then
cov(X,u) 0
Proof
(4)
Cov(X,u) E([X E(X)][(u E(u)])
E[(X X)(u 0)]
E(Xu Xu)
E(Xu) E(Xu)
E(Xu) XE(u)
E(Xu)
0
by assumption
(5)**
But because in most cases it is physically impossible to estimate the PRF (why?) we take a
sample and estimate a SRF (in stochastic form):
Y
0
1 X u
(6)
is an estimator of 0,
is an estimator of 1
Y
0
1 X
0
Since E(u)
(7)
(sample counterpart)
Hence
YY
(8)
And
YY
(9)
Y
0
1 X u
(9b)
Empirical Economics
26163
Handout 1
The residual is the deviations from the fitted line- it is the difference between a data
point and the fitted line.The residual proxies the error.
residual: u
Yi
error: u
Yi
PRL: Y 0 1 X
X
SRL: Y 0 1 X
Sample counterparts to equation 3 and 5
E(Y
0
1X) 0
(10)**
0
E(u)
E[X(Y
0
1X)] 0
(11)**
) 0
E(Xu
E(Y
0
1 X) 0
Since the expected value is the mean, that is, we sum the variables and divide by number
of observation we can rewrite the above as
1 n
( Yi
0
1 Xi ) 0
n i1
(12)**
) E(X, u
) E[X(Y
Cov(X, u
0
1 X)] 0
1 n
Xi ( Yi
0
1 Xi ) 0
n i1
Empirical Economics
26163
Handout 1
(13)**
Note the hat on the parameters- this says they are estimators of the population
parameters. (** denotes the crucial equations).
Equations 12 and 13 are very important- they are the first order conditions also known as
the normal equations and can be solve simultaneously to provide
and 1 .
(Xi X )( Yi Y )
i1
(Xi X ) 2
i1
0 Y
1 X
Once we know these parameters we can find a value of Y for each X- this is the fitted
value of Y. Remember that Y and X are series (wages and education across individuals).
Y i = 0 + 1 X i
(21)
This shows the estimated relationship between two variables. It is also called the sample
regression function since more often than not it is physically impossible to incorporate all
members of a population, rather a sample is taken and based on this we obtain the
estimators of the population parameters.
We can now illustrate this fitted line graphically Fig. 2.4 Woolridge
residual
Yn
Y 0 1 X
Yn
fitted value
X
It shows the predicted or fitted value of Y given that X assumes a particular value.
The fitted value of Y may not coincide with the actual value of Y.
Empirical Economics
26163
Handout 1
u i Yi Y i or
= Yi
0 1 Xi
(22)
Summary: In OLS
Ideally the fitted values must be as close as possible to the actual value.
This means that we must ensure that the residual is the smallest that it can be.
There are several methods of doing this, the most frequently used method is OLS,
which states that the parameters (0 and 1) must be chosen in such a way that
the residual sum of squares (RSS) is as small as possible. The ideal case is for the
RSS to be equal to 0. (This is essentially what was done above).
Algebraically this means
Minimise
n
i1
ui
(Y Y ) = (Y
i
i1
i1
0 1
(1)
Xi ) 2
-2
(Yi- 0 - 1 Xi)=0
(2)
i 1
X (Y -
-2
- 1 Xi)=0
(3)
i 1
Collecting terms and rearranging gives the solutions to the first order conditions:
0 Y
1 X
n
i1
i1
1 ( Xi Yi nX Y ) / ( Xi2 nX 2 )
or
(Xi X )( Yi Y )
i1
(Xi X ) 2
i1
or
x y /x
i1
i i
i1
Empirical Economics
26163
Handout 1
where