Académique Documents
Professionnel Documents
Culture Documents
t o
=
x
e x f
Note constants:
t=3.14159
e=2.71828
This is a bell shaped curve
with different centers and
spreads depending on
and o
Method of ML
The method of maximum likelihood is
intuitively appealing, because we attempt to
find the values of the true parameters that
would have most likely produced the data that
we in fact observed.
For most cases of practical interest, the
performance of maximum likelihood
estimators is optimal for large enough data.
1
L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
This sequence introduces the
principle of maximum likelihood
estimation and illustrates it with
some simple examples.
Suppose that you have a normally-
distributed random variable X with
unknown population mean and
standard deviation o, and that you
have a sample of two
observations, 4 and 6. For the
time being, we will assume that o
is equal to 1.
Suppose initially you consider the
hypothesis = 3.5. Under this
hypothesis the probability density
at 4 would be 0.3521 and that at 6
would be 0.0175.
L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
p(4) p(6)
3.5 0.3521 0.0175
0.3521
0.0175
Suppose initially you
consider the hypothesis =
3.5. Under this hypothesis
the probability density at 4
would be 0.3521 and that at
6 would be 0.0175.
4
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint probability density, shown in the bottom chart, is the product of these, 0.0062.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
L
p
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.3521
0.0175
5
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Next consider the hypothesis = 4.0. Under this hypothesis the probability densities
associated with the two observations are 0.3989 and 0.0540, and the joint probability
density is 0.0215.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.3989
0.0540
6
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 4.5, the probability densities are 0.3521 and 0.1295, and the joint
probability density is 0.0456.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.3521
0.1295
7
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 5.0, the probability densities are both 0.2420 and the joint
probability density is 0.0585.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
L
p
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.2420 0.2420
8
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Under the hypothesis = 5.5, the probability densities are 0.1295 and 0.3521 and the joint
probability density is 0.0456.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
L
p
0.3521
0.1295
9
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The complete joint density function for all values of has now been plotted in the lower
diagram. We see that it peaks at = 5.
p(4) p(6) L
3.5 0.3521 0.0175 0.0062
4.0 0.3989 0.0540 0.0215
4.5 0.3521 0.1295 0.0456
5.0 0.2420 0.2420 0.0585
5.5 0.1295 0.3521 0.0456
0.00
0.02
0.04
0.06
0 1 2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8
p
L
0.1295
0.3521
10
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Now we will look at the mathematics of the example. If X is normally distributed with mean
and standard deviation o, its density function is as shown.
2
2
1
2
1
) (
|
.
|
\
|
=
o
t o
X
e X f
11
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
For the time being, we are assuming o is equal to 1, so the density function simplifies to the
second expression.
( )
2
2
1
2
1
) (
t
=
X
e X f
2
2
1
2
1
) (
|
.
|
\
|
=
o
t o
X
e X f
12
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
Hence we obtain the probability densities for the observations where X = 4 and X = 6.
( )
2
4
2
1
2
1
) 4 (
t
= e f
( )
2
6
2
1
2
1
) 6 (
t
= e f
( )
2
2
1
2
1
) (
t
=
X
e X f
2
2
1
2
1
) (
|
.
|
\
|
=
o
t o
X
e X f
13
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
The joint probability density for the two observations in the sample is just the product of
their individual densities.
( )
2
6
2
1
2
1
) 6 (
t
= e f
( )
2
2
1
2
1
) (
t
=
X
e X f
2
2
1
2
1
) (
|
.
|
\
|
=
o
t o
X
e X f
( ) ( )
|
|
.
|
\
|
|
|
.
|
\
|
=
2
6
2
1
2
4
2
1
2
1
2
1
t t
e e joint density
( )
2
4
2
1
2
1
) 4 (
t
= e f
14
INTRODUCTION TO MAXIMUM LIKELIHOOD ESTIMATION
In maximum likelihood estimation we choose as our estimate of the value that gives us the
greatest joint density for the observations in our sample. This value is associated with the
greatest probability, or maximum likelihood, of obtaining the observations in the sample.
2
2
1
2
1
) (
|
.
|
\
|
=
o
t o
X
e X f
( )
2
2
1
2
1
) (
t
=
X
e X f
( )
2
4
2
1
2
1
) 4 (
t
= e f
( )
2
6
2
1
2
1
) 6 (
t
= e f
( ) ( )
|
|
.
|
\
|
|
|
.
|
\
|
=
2
6
2
1
2
4
2
1
2
1
2
1
t t
e e joint density
MLE AND REGRESSION ANALYSIS
1
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
X
Y
X
i
|
1
|
1
+ |
2
X
i
We will now apply the maximum likelihood principle to regression analysis, using the simple linear model
Y = |
1
+ |
2
X + u.
2
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The black marker shows the value that Y would have if X were equal to X
i
and if there were no
disturbance term.
X
Y
X
i
|
1
|
1
+ |
2
X
i
3
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
However we will assume that there is a disturbance term in the model and that it has a normal
distribution as shown.
X
Y
X
i
|
1
|
1
+ |
2
X
i
4
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Relative to the black marker, the curve represents the ex ante distribution for u, that is, its potential
distribution before the observation is generated. Ex post, of course, it is fixed at some specific value.
X
Y
X
i
|
1
|
1
+ |
2
X
i
5
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Relative to the horizontal axis, the curve also represents the ex ante distribution for Y for that
observation, that is, conditional on X = X
i
.
X
Y
X
i
|
1
|
1
+ |
2
X
i
6
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Potential values of Y close to |
1
+ |
2
X
i
will have relatively large densities ...
X
Y
X
i
|
1
|
1
+ |
2
X
i
X
Y
X
i
|
1
|
1
+ |
2
X
i
7
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
... while potential values of Y relatively far from |
1
+ |
2
X
i
will have small ones.
8
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The mean value of the distribution of Y
i
is |
1
+ |
2
X
i
. Its standard deviation is o, the standard deviation of
the disturbance term.
X
Y
X
i
|
1
|
1
+ |
2
X
i
9
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Hence the density function for the ex ante distribution of Y
i
is as shown.
X
Y
X
i
|
1
|
1
+ |
2
X
i
2
2
1
2 1
2
1
) (
|
.
|
\
|
=
o
| |
t o
i i
X Y
i
e Y f
10
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The joint density function for the observations on Y is the product of their individual densities.
2
2
1
2 1
2
1
) (
|
.
|
\
|
=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|
\
|
|
.
|
\
|
=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
11
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Now, taking |
1
, |
2
and o as our choice variables, and taking the data on Y and X as given, we can re-
interpret this function as the likelihood function for |
1
, |
2
, and o.
2
2
1
2 1
2
1
) (
|
.
|
\
|
=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|
\
|
|
.
|
\
|
=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
( )
2
2
1
2
2
1
1 2 1
2 1 1 2 1 1
2
1
...
2
1
,..., | , ,
|
.
|
\
|
|
.
|
\
|
=
o
| |
o
| |
t o t o
o | |
n n
X Y X Y
n
e e Y Y L
12
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
We will choose |
1
, |
2
, and o so as to maximize the likelihood, given the data on Y and X. As usual, it is
easier to do this indirectly, maximizing the log-likelihood instead.
2
2
1
2 1
2
1
) (
|
.
|
\
|
=
o
| |
t o
i i
X Y
i
e Y f
2
2
1
2
2
1
1
2 1 1 2 1 1
2
1
...
2
1
) ( ... ) (
|
.
|
\
|
|
.
|
\
|
=
o
| |
o
| |
t o t o
n n
X Y X Y
n
e e Y f Y f
( )
2
2
1
2
2
1
1 2 1
2 1 1 2 1 1
2
1
...
2
1
,..., | , ,
|
.
|
\
|
|
.
|
\
|
=
o
| |
o
| |
t o t o
o | |
n n
X Y X Y
n
e e Y Y L
|
|
.
|
\
|
=
|
.
|
\
|
|
.
|
\
|
2
2
1
2
2
1
2 1 1 2 1 1
2
1
...
2
1
log log
o
| |
o
| |
t o t o
n n
X Y X Y
e e L
13
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
As usual, the first step is to decompose the expression as the sum of the logarithms of the factors.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
|
.
|
\
|
+ +
|
|
.
|
\
|
=
|
|
.
|
\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
14
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Then we split the logarithm of each factor into two components. The first component is the same in each
case.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
|
.
|
\
|
+ +
|
|
.
|
\
|
=
|
|
.
|
\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
15
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Hence the log-likelihood simplifies as shown.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
|
.
|
\
|
+ +
|
|
.
|
\
|
=
|
|
.
|
\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
16
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
To maximize the log-likelihood, we need to minimize Z. But choosing estimators of |
1
and |
2
to minimize
Z is exactly what we did when we derived the least squares regression coefficients.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
|
.
|
\
|
+ +
|
|
.
|
\
|
=
|
|
.
|
\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
17
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Thus, for this regression model, the maximum likelihood estimators of |
1
and |
2
are identical to the least
squares estimators.
Z n
X Y X Y
n
e e
e e L
n n
X Y X Y
X Y X Y
n n
n n
2 2
1
log
2
1
...
2
1
2
1
log
2
1
log ...
2
1
log
2
1
...
2
1
log log
2
2
2 1
2
1 2 1 1
2
2
1
2
2
1
2
2
1
2
2
1
2 1 1 2 1 1
2 1 1 2 1 1
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
.
|
\
|
|
.
|
\
|
|
.
|
\
|
=
|
|
.
|
\
|
+ +
|
|
.
|
\
|
=
|
|
.
|
\
|
=
o
t o
o
| |
o
| |
t o
t o t o
t o t o
o
| |
o
| |
o
| |
o
| |
| |
2
2 1
2
1 2 1 1
) ( ... ) ( where
n n
X Y X Y Z | | | | + + =
18
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
As a consequence, Z will be the sum of the squares of the least squares residuals.
| |
i i i i
n n
X b b Y e e
X Y X Y Z
2 1
2
2
2 1
2
1 2 1 1
where
) ( ... ) ( where
= =
+ + =
| | | |
Z n L
2 2
1
log log
2
|
.
|
\
|
=
o
t o
19
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
To obtain the maximum likelihood estimator of o, it is convenient to rearrange the log-likelihood function
as shown.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2
|
.
|
\
|
+ =
|
.
|
\
|
+
|
.
|
\
|
=
|
.
|
\
|
=
o
t
o
o
t o
o
t o
20
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Differentiating it with respect to o, we obtain the expression shown.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2
|
.
|
\
|
+ =
|
.
|
\
|
+
|
.
|
\
|
=
|
.
|
\
|
=
o
t
o
o
t o
o
t o
( )
2 3 3
log
o o o
o o
n Z Z
n L
= + =
c
c
21
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
The first order condition for a maximum requires this to be equal to zero. Hence the maximum likelihood
estimator of the variance is the sum of the squares of the residuals divided by n.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2
|
.
|
\
|
+ =
|
.
|
\
|
+
|
.
|
\
|
=
|
.
|
\
|
=
o
t
o
o
t o
o
t o
( )
2 3 3
log
o o o
o o
n Z Z
n L
= + =
c
c
n
e
n
Z
i
= =
2
2
o
22
MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS
Note that this is biased for finite samples. To obtain an unbiased estimator, we should divide by nk,
where k is the number of parameters, in this case 2. However, the bias disappears as the sample size
becomes large.
Z n n
Z n n
Z n L
2 2
1
log log
2 2
1
log
1
log
2 2
1
log log
2
2
2
|
.
|
\
|
+ =
|
.
|
\
|
+
|
.
|
\
|
=
|
.
|
\
|
=
o
t
o
o
t o
o
t o
( )
2 3 3
log
o o o
o o
n Z Z
n L
= + =
c
c
n
e
n
Z
i
= =
2
2
o
APPLICATIONS OF MLE
Probit and Logit Models
(Additional) References
Cramer, J.S., An Introduction to Logit Model for Economists, 2nd
Ed., 2000, Timberlake Consultats LTD (Chapter 2)
Hill, Griffiths, Judge, Undergraduate Econometrics, 2
nd
Ed, 2001
(chapter 12)
Johnston, J., and DiNardo, J., Econometric Methods, 4th ed.,
1997, McGrawHill (Chapter 13)
Lye, Jenny, Limited Dependent Variables, Handout,
Melbourne University, 2006
Vahid, Farshid , 2002, Applied Econometrics: Section A:
Introduction to Microeconometrics, Handout, Monash
University, Australia
Winkelmann & Boes, Analysis of Microdata, 2006 (Chapter 1-4)