Académique Documents
Professionnel Documents
Culture Documents
) is a MLE of g()
Example:
In the example about the lifetime components, we computed the observed
MLE value of
= 0.455. Following the theorem, the observed value of
would be 1/0.455
Note: This invariance property can be extended to functions that are not
one-to-one.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 4 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Consistency property of MLEs
Convergence of a sequence of MLEs
Consistency property
Consider a random sample taken from a distribution with parameter .
Suppose that for every large sample of size n greater that some given
minimum, there exists a unique MLE of . Then under certain conditions,
the sequence of MLEs is a consistent sequence of estimators of . This
means that the MLE sequence converges in probability to the unknown
value of as n
Note: Required conditions need to prove this results will not be detailed
here but are typically satised for most practical problems.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 5 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Example: Sampling from a Gamma distribution
Suppose that X
1
, X
2
, . . . , X
n
is a random sample from a Gamma
distribution with pdf:
f (x|) =
1
()
x
1
exp
x
Suppose that is unknown ( > 0). The likelihood function is:
f
n
(x|) =
1
n
()
(
n
i =1
x
i
)
1
exp(
n
i =1
x
i
)
The MLE of will satisfy the equation:
log f
n
(x|)
= 0
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 6 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Example: Sampling from a Gamma distribution (Cont.)
When we take the derivate of the likelihood function with respect to we
get the following equation:
()
()
=
1
n
n
i =1
log x
i
The function
()
()
is the digamma function.
The unique value of that satises the above equation will be the MLE of
. This value can be determined using a numerical method as the
Newtons method or by using the tables of the digamma function available
in several mathematical packages.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 7 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Newtons method
Let f () be a real-valued function. Newtons method is used to nd the
solution of the equation:
f () = 0
.
We get an initial value =
0
. The Newton method works by updating
the initial guess value with the equation:
1
=
0
f (
0
)
f
(
0
)
You continue iterating until the results stabilize to a given value.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 8 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Example: Newtons method
Suppose we observed n = 20 random variables X
1
, X
2
, . . . , X
20
from a
Gamma distribution with parameters and = 1. Suppose that
1
n
n
i =1
log x
i
= 1.22 and
1
n
n
i =1
x
i
= 3.679. Since E[X
i
] = / = , this
suggests an initial value for
0
= 3.679. We want to nd such that
() = 1.22 or f () = () 1.22 = 0. The derivative of f () is
()
which is the trigamma function. The rst iterate in the Newton method is:
1
=
0
(
0
) 1.22
(
0
)
= 3.679
1.1607 1.22
0.3120
= 3.871
After few iterations the value stabilizes at 3.876.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 9 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance
Suppose that X
1
, X
2
, . . . , X
n
is a sample from a Normal distribution with
mean and variance
2
both unknown. The parameter vector is
= (,
2
). The likelihood function f
n
(x|,
2
) will be again given by the
equation:
f
n
(x|,
2
) =
1
(2
2
)
n/2
exp[
1
2
2
n
i =1
(x
i
)
2
]
This function must be maximized for all possible values of and
2
,
where < < . Again it is easier to maximize log f
n
(x|,
2
)
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 10 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance (Cont.)
The steps to maximize L() are as follows:
1 Write the log likelihood:
L() = log f
n
(x|,
2
)
=
n
2
log(2)
n
2
log
2
1
2
2
n
i =1
(x
i
)
2
2 Assume
2
is xed and nd (
2
) that maximizes L().
3 Find the value
2
of
2
that maximizes L(
), where
= ( (
2
),
2
)
4 Set the MLE estimator as the random vector: ( (
2
),
2
)
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 11 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance (Cont.)
Second Step: The value of (
2
) was found when the likelihood was
maximized for
2
xed or known. The solution was (
2
) = x
n
Third Step: We set
= ( x
n
,
2
) and maximize
L(
) =
n
2
log(2)
n
2
log
2
1
2
2
n
i =1
(x
i
x
n
)
2
We take the derivative with respect to
2
, makes it equal to 0 and
solves for
2
. The derivative is:
d
d
2
L(
) =
n
2
1
2
+
1
2(
2
)
2
n
i =1
(x
i
x
n
)
2
.
Setting this derivative to zero and solving for
2
we get:
2
=
1
n
n
i =1
(x
i
x
n
)
2
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 12 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance (Cont.)
Fourth Step: The maximum likelihood estimator is set as ( (
2
),
2
):
= ( ,
2
) = (
X
n
,
1
n
n
i =1
(X
i
X
n
)
2
)
Notes:
Second derivative of L(
1
, X
2
, . . . , X
n
such that for every , the joint distribution of X
1
, X
2
, . . . , X
n
is the same as
the joint distribution of X
1
, X
2
, . . . , X
n
. The statistics T is sucient in the
sense that one can use X
1
, X
2
, . . . , X
n
in the same way as X
1
, X
2
, . . . , X
n
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 15 / 19
Sucient Statistics Factorization criterion
Factorization criterion
How to nd a sucient statistic?
This method was developed by R.A. Fisher in 1922, J. Neyman in 1935
and P.R. Halmos and L.J. Savage in 1949.
Theorem: Factorization criterion
Let X
1
, X
2
, . . . , X
n
form a random variable from either a continuous or a
discrete distribution for which the pdf or the pf is f (x|). The value of is
unknown and belongs to a parameter space . A statistic
T = r (X
1
, X
2
, . . . , X
n
) is a sucient statistic if and only if the joint pdf or
pf f
n
(x|) can be factored as: f
n
(x|) = u(x)v[r (x, )] for all values of
x = (x
1
, x
2
, . . . , x
n
)
n
and
Functions u and v are non-negatives. u depends on x but it does not
depend on . v depends on and depends on x only through the value of
the statistics r (x).
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 16 / 19
Sucient Statistics Factorization criterion
Factorization criterion
Example 1: Sampling from a Poisson distribution
Suppose that X = (X
1
, X
2
, . . . , X
n
) is a random sample from a Poisson
distribution for which the true value of the mean is unknown. ( > 0). Let
r (x) =
n
i =1
x
i
. We can prove that T = r (X) =
n
i =1
X
i
is a sucient
statistic for . The joint pf f
n
(x|) is:
f
n
(x|) =
n
i =1
e
x
i
x
i
!
= (
n
i =1
1
x
i
!
)e
n
r (x)
Let u(x) =
n
i =1
1
x
i
!
and v(t, ) = e
n
t
. This means that f
n
(x|) has
been factored as in the theorem. It follows that T =
n
i =1
X
i
is a
sucient statistic for .
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 17 / 19
Sucient Statistics Factorization criterion
Factorization criterion
Example 2: Sampling from a continuous distribution
Suppose that X = (X
1
, X
2
, . . . , X
n
) is a random sample from a continuous
distribution:
f (x|) =
x
1
if 0 < x < 1
0 otherwise
We will show that T =
n
i =1
X
i
is a sucient statistic for . For
0 < x
i
< 1(i = 1, . . . , n) the joint pdf is:
f
n
(x|) =
n
(
n
i =1
x
i
)
1
=
n
[r (x)]
1
. If at leat one value of x
i
is outside the interval 0 < x
i
< 1, f
n
(x|) = 0.
If we set u(x) = 1 and v(t, ) =
n
t
1
, f
n
(x|) is factored as in the
Theorem and T =
n
i =1
X
i
is a sucient statistics for .
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 18 / 19
Sucient Statistics Factorization criterion
Thanks for your attention ...
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 19 / 19