Vous êtes sur la page 1sur 64

Applied Econometrics

Master of Applied Economics Program


Universitas Padjadjaran
Today
Introduction to Maximum Likelihood
Estimation
Application of Maximum Likelihood Estimation
Limited Dependent Variable Models
Probit
Logit
Additional References
Dougherty, Introduction to Econometrics, 4
th

Ed, 2011 *best for basics*
Freund, J., Mathematical Statistics, 1992
Myung, IJ., Tutorial on maximum likelihood
estimation, Journal of Mathematical
Psychology 47, 2003
Ramachandran & Sokos, Mathematical
Statistics with Applications, 2009

Method of ML
The method of maximum likelihood is
intuitively appealing, because we attempt to
find the values of the true parameters that
would have most likely produced the data that
we in fact observed.
For most cases of practical interest, the
performance of maximum likelihood
estimators is optimal for large enough data.
Method of ML
To compute the likelihood we need to have a
good understanding of probability distribution
(density function)
Probabilities: Discrete Data
If our data is discrete random variable, we have the
(discrete) probability distribution of the data
A table, formula or graph that lists all possible values a
discrete random variable can assume, together with
associated probabilities
Important Binomial, Poisson

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.

The content of this slideshow comes from Section R.2 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2012.09.01
1
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6







Dougherty 2012
2
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1
2
3
4
5
6
Dougherty 2012
3
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1
2
3
4
5
6
Dougherty 2012
4
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1
2
3
4
5
6 10
Dougherty 2012
5
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1
2
3
4
5 7
6
Dougherty 2012
6
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
.
Dougherty 2012
7
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X f p
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Dougherty 2012
8
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X f p
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Dougherty 2012
9
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X f p
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Dougherty 2012
10
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X f p
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Dougherty 2012
11
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X f p
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Dougherty 2012
12
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X f p
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Dougherty 2012
13
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X f p
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
Dougherty 2012
14
6
__
36
5
__
36
4
__
36
3
__
36
2
__
36
2
__
36
3
__
36
5
__
36
4
__
36
probability
2 3 4 5 6 7 8 9 10 11 12 X
1
36
1
36
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE
Dougherty 2012
Discrete Probability Distribution when
we have more than 1 RV
The distribution of a single random variable is known
as a univariate distribution
But we might be interested in the intersection of two
events, in which case we need to look at joint
distributions
The joint (probability) distributions of two or more
random variables are termed bivariate or
multivariate distributions

Discrete Probability Distribution when
when we have more than 1 RV
If individual observations (y
i
) are statistically
independent of one another, then according to the
theory of probability, the PDF for the data y=(y
1
, y
2
,
, y
n
) given the parameter vector w can be expressed
as a multiplication of PDFs for individual observations
Discrete Probability Distribution when
we have more than 1 RV
Normal Distribution
2
) (
2
1
2
1
) (
o

t o

=
x
e x f
Note constants:
t=3.14159
e=2.71828

This is a bell shaped curve
with different centers and
spreads depending on
and o
Method of ML
The method of maximum likelihood is
intuitively appealing, because we attempt to
find the values of the true parameters that
would have most likely produced the data that
we in fact observed.
For most cases of practical interest, the
performance of maximum likelihood
estimators is optimal for large enough data.


1
L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8


This sequence introduces the
principle of maximum likelihood
estimation and illustrates it with
some simple examples.

Suppose that you have a normally-
distributed random variable X with
unknown population mean and
standard deviation o, and that you
have a sample of two
observations, 4 and 6. For the
time being, we will assume that o
is equal to 1.

Suppose initially you consider the
hypothesis = 3.5. Under this
hypothesis the probability density
at 4 would be 0.3521 and that at 6
would be 0.0175.


L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8


p(4) p(6)
3.5 0.3521 0.0175




0.3521
0.0175
Suppose initially you
consider the hypothesis =
3.5. Under this hypothesis
the probability density at 4
would be 0.3521 and that at
6 would be 0.0175.


4
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint probability density, shown in the bottom chart, is the product of these, 0.0062.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062




L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8


0.3521
0.0175


5
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Next consider the hypothesis = 4.0. Under this hypothesis the probability densities
associated with the two observations are 0.3989 and 0.0540, and the joint probability
density is 0.0215.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215



L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8

0.3989
0.0540


6
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 4.5, the probability densities are 0.3521 and 0.1295, and the joint
probability density is 0.0456.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456


L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8

0.3521
0.1295


7
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 5.0, the probability densities are both 0.2420 and the joint
probability density is 0.0585.

p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585

L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8

0.2420 0.2420


8
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 5.5, the probability densities are 0.1295 and 0.3521 and the joint
probability density is 0.0456.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
L
p


0.3521
0.1295


9
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The complete joint density function for all values of has now been plotted in the lower
diagram. We see that it peaks at = 5.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
p
L


0.1295
0.3521


10
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Now we will look at the mathematics of the example. If X is normally distributed with mean
and standard deviation o, its density function is as shown.

2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f


11
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
For the time being, we are assuming o is equal to 1, so the density function simplifies to the
second expression.

( )
2
2
1
2
1
) (

t

=
X
e X f
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f


12
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Hence we obtain the probability densities for the observations where X = 4 and X = 6.

( )
2
4
2
1
2
1
) 4 (

t

= e f
( )
2
6
2
1
2
1
) 6 (

t

= e f
( )
2
2
1
2
1
) (

t

=
X
e X f
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f


13
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint probability density for the two observations in the sample is just the product of
their individual densities.

( )
2
6
2
1
2
1
) 6 (

t

= e f
( )
2
2
1
2
1
) (

t

=
X
e X f
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1

t t
e e joint density
( )
2
4
2
1
2
1
) 4 (

t

= e f


14
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
In maximum likelihood estimation we choose as our estimate of the value that gives us the
greatest joint density for the observations in our sample. This value is associated with the
greatest probability, or maximum likelihood, of obtaining the observations in the sample.
2
2
1
2
1
) (
|
.
|

\
|

=
o

t o
X
e X f
( )
2
2
1
2
1
) (

t

=
X
e X f
( )
2
4
2
1
2
1
) 4 (

t

= e f
( )
2
6
2
1
2
1
) 6 (

t

= e f
( ) ( )
|
|
.
|

\
|
|
|
.
|

\
|
=

2
6
2
1
2
4
2
1
2
1
2
1

t t
e e joint density
MLE AND REGRESSION ANALYSIS
1
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
X
Y
X
i

|
1

|
1
+ |
2
X
i
We will now apply the maximum likelihood principle to regression analysis, using the simple linear model
Y = |
1
+ |
2
X + u.
2
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The black marker shows the value that Y would have if X were equal to X
i
and if there were no
disturbance term.
X
Y
X
i

|
1

|
1
+ |
2
X
i
3
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
However we will assume that there is a disturbance term in the model and that it has a normal
distribution as shown.
X
Y
X
i

|
1

|
1
+ |
2
X
i
4
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Relative to the black marker, the curve represents the ex ante distribution for u, that is, its potential
distribution before the observation is generated. Ex post, of course, it is fixed at some specific value.
X
Y
X
i

|
1

|
1
+ |
2
X
i
5
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Relative to the horizontal axis, the curve also represents the ex ante distribution for Y for that
observation, that is, conditional on X = X
i
.
X
Y
X
i

|
1

|
1
+ |
2
X
i
6
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Potential values of Y close to |
1
+ |
2
X
i
will have relatively large densities ...
X
Y
X
i

|
1

|
1
+ |
2
X
i
X
Y
X
i

|
1

|
1
+ |
2
X
i
7
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
... while potential values of Y relatively far from |
1
+ |
2
X
i
will have small ones.
8
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The mean value of the distribution of Y
i
is |
1
+ |
2
X
i
. Its standard deviation is o, the standard deviation of
the disturbance term.
X
Y
X
i

|
1

|
1
+ |
2
X
i
9
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Hence the density function for the ex ante distribution of Y
i
is as shown.
X
Y
X
i

|
1

|
1
+ |
2
X
i
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
10
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The joint density function for the observations on Y is the product of their individual densities.
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
11
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Now, taking |
1
, |
2
and o as our choice variables, and taking the data on Y and X as given, we can re-
interpret this function as the likelihood function for |
1
, |
2
, and o.
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
( )
2
2
1
2
2
1
1 2 1
2 1 1 2 1 1
2
1
...
2
1
,..., | , ,
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
o | |
n n
X Y X Y
n
e e Y Y L
12
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
We will choose |
1
, |
2
, and o so as to maximize the likelihood, given the data on Y and X. As usual, it is
easier to do this indirectly, maximizing the log-likelihood instead.
2
2
1
2 1
2
1
) (
|
.
|

\
|

=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
( )
2
2
1
2
2
1
1 2 1
2 1 1 2 1 1
2
1
...
2
1
,..., | , ,
|
.
|

\
|

|
.
|

\
|

=
o
| |
o
| |
t o t o
o | |
n n
X Y X Y
n
e e Y Y L
|
|
.
|

\
|
=
|
.
|

\
|

|
.
|

\
|

2
2
1
2
2
1
2 1 1 2 1 1
2
1
...
2
1
log log
o
| |
o
| |
t o t o
n n
X Y X Y
e e L
13
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
As usual, the first step is to decompose the expression as the sum of the logarithms of the factors.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
14
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Then we split the logarithm of each factor into two components. The first component is the same in each
case.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
15
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Hence the log-likelihood simplifies as shown.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
16
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
To maximize the log-likelihood, we need to minimize Z. But choosing estimators of |
1
and |
2
to minimize
Z is exactly what we did when we derived the least squares regression coefficients.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
17
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Thus, for this regression model, the maximum likelihood estimators of |
1
and |
2
are identical to the least
squares estimators.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|

|
.
|

\
|
=
|
.
|

\
|


|
.
|

\
|

|
.
|

\
|
=
|
|
.
|

\
|
+ +
|
|
.
|

\
|
=
|
|
.
|

\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
18
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
As a consequence, Z will be the sum of the squares of the least squares residuals.
| |
i i i i
n n
X b b Y e e
X Y X Y Z
2 1
2
2
2 1
2
1 2 1 1
where
) ( ... ) ( where
= =
+ + =

| | | |
Z n L
2 2
1
log log
2

|
.
|

\
|
=
o
t o
19
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
To obtain the maximum likelihood estimator of o, it is convenient to rearrange the log-likelihood function
as shown.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2

|
.
|

\
|
+ =

|
.
|

\
|
+
|
.
|

\
|
=

|
.
|

\
|
=
o
t
o
o
t o
o
t o
20
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Differentiating it with respect to o, we obtain the expression shown.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2

|
.
|

\
|
+ =

|
.
|

\
|
+
|
.
|

\
|
=

|
.
|

\
|
=
o
t
o
o
t o
o
t o
( )
2 3 3
log
o o o
o o
n Z Z
n L
= + =
c
c

21
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The first order condition for a maximum requires this to be equal to zero. Hence the maximum likelihood
estimator of the variance is the sum of the squares of the residuals divided by n.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2

|
.
|

\
|
+ =

|
.
|

\
|
+
|
.
|

\
|
=

|
.
|

\
|
=
o
t
o
o
t o
o
t o
( )
2 3 3
log
o o o
o o
n Z Z
n L
= + =
c
c

n
e
n
Z
i
= =
2
2
o
22
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Note that this is biased for finite samples. To obtain an unbiased estimator, we should divide by nk,
where k is the number of parameters, in this case 2. However, the bias disappears as the sample size
becomes large.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2

|
.
|

\
|
+ =

|
.
|

\
|
+
|
.
|

\
|
=

|
.
|

\
|
=
o
t
o
o
t o
o
t o
( )
2 3 3
log
o o o
o o
n Z Z
n L
= + =
c
c

n
e
n
Z
i
= =
2
2
o
APPLICATIONS OF MLE
Probit and Logit Models
(Additional) References
Cramer, J.S., An Introduction to Logit Model for Economists, 2nd
Ed., 2000, Timberlake Consultats LTD (Chapter 2)
Hill, Griffiths, Judge, Undergraduate Econometrics, 2
nd
Ed, 2001
(chapter 12)
Johnston, J., and DiNardo, J., Econometric Methods, 4th ed.,
1997, McGrawHill (Chapter 13)
Lye, Jenny, Limited Dependent Variables, Handout,
Melbourne University, 2006
Vahid, Farshid , 2002, Applied Econometrics: Section A:
Introduction to Microeconometrics, Handout, Monash
University, Australia
Winkelmann & Boes, Analysis of Microdata, 2006 (Chapter 1-4)

Vous aimerez peut-être aussi