Vous êtes sur la page 1sur 21

Statistical Inference

Classical and Bayesian Methods


Class 3
AMS-UCSC
Jan 17, 2012
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 1 / 19
Topics
Topics
We will talk about...
1 Properties of Maximum Likelihood Estimators
2 Maximum Likelihood Estimators for multi-parametric models
3 Sucient Statistics
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 2 / 19
Topics
Topics
We will talk about...
1 Properties of Maximum Likelihood Estimators
2 Maximum Likelihood Estimators for multi-parametric models
3 Sucient Statistics
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 2 / 19
Topics
Topics
We will talk about...
1 Properties of Maximum Likelihood Estimators
2 Maximum Likelihood Estimators for multi-parametric models
3 Sucient Statistics
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 2 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Invariance
Example: Lifetimes of electronic components
Example:
In the previous example about the lifetimes of electronic components, the
parameter was interpreted as the failure rate of electronic components
(ie. Number of failures per year).
The inverse = 1/ is the average lifetime of electronic components.
How can we calculate the MLE of ? Is there any relationship between
the MLE of and the MLE of ?
The answer is TRUE and it is explained in the following Theorem.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 3 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Invariance property of MLEs
Theorem
Invariance Theorem
If

is the maximum likelihood estimator of and if g is a one-to-one
function, then g(

) is a MLE of g()
Example:
In the example about the lifetime components, we computed the observed
MLE value of

= 0.455. Following the theorem, the observed value of

would be 1/0.455
Note: This invariance property can be extended to functions that are not
one-to-one.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 4 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Consistency property of MLEs
Convergence of a sequence of MLEs
Consistency property
Consider a random sample taken from a distribution with parameter .
Suppose that for every large sample of size n greater that some given
minimum, there exists a unique MLE of . Then under certain conditions,
the sequence of MLEs is a consistent sequence of estimators of . This
means that the MLE sequence converges in probability to the unknown
value of as n
Note: Required conditions need to prove this results will not be detailed
here but are typically satised for most practical problems.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 5 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Example: Sampling from a Gamma distribution
Suppose that X
1
, X
2
, . . . , X
n
is a random sample from a Gamma
distribution with pdf:
f (x|) =
1
()
x
1
exp
x
Suppose that is unknown ( > 0). The likelihood function is:
f
n
(x|) =
1

n
()
(
n

i =1
x
i
)
1
exp(
n

i =1
x
i
)
The MLE of will satisfy the equation:
log f
n
(x|)

= 0
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 6 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Example: Sampling from a Gamma distribution (Cont.)
When we take the derivate of the likelihood function with respect to we
get the following equation:

()
()
=
1
n
n

i =1
log x
i
The function

()
()
is the digamma function.
The unique value of that satises the above equation will be the MLE of
. This value can be determined using a numerical method as the
Newtons method or by using the tables of the digamma function available
in several mathematical packages.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 7 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Newtons method
Let f () be a real-valued function. Newtons method is used to nd the
solution of the equation:
f () = 0
.
We get an initial value =
0
. The Newton method works by updating
the initial guess value with the equation:

1
=
0

f (
0
)
f

(
0
)
You continue iterating until the results stabilize to a given value.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 8 / 19
Properties of Maximum Likelihood Estimators Properties of Maximum Likelihood Estimators
Estimating MLEs by numerical computations
Example: Newtons method
Suppose we observed n = 20 random variables X
1
, X
2
, . . . , X
20
from a
Gamma distribution with parameters and = 1. Suppose that
1
n

n
i =1
log x
i
= 1.22 and
1
n

n
i =1
x
i
= 3.679. Since E[X
i
] = / = , this
suggests an initial value for
0
= 3.679. We want to nd such that
() = 1.22 or f () = () 1.22 = 0. The derivative of f () is

()
which is the trigamma function. The rst iterate in the Newton method is:

1
=
0

(
0
) 1.22

(
0
)
= 3.679
1.1607 1.22
0.3120
= 3.871
After few iterations the value stabilizes at 3.876.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 9 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance
Suppose that X
1
, X
2
, . . . , X
n
is a sample from a Normal distribution with
mean and variance
2
both unknown. The parameter vector is
= (,
2
). The likelihood function f
n
(x|,
2
) will be again given by the
equation:
f
n
(x|,
2
) =
1
(2
2
)
n/2
exp[
1
2
2
n

i =1
(x
i
)
2
]
This function must be maximized for all possible values of and
2
,
where < < . Again it is easier to maximize log f
n
(x|,
2
)
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 10 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance (Cont.)
The steps to maximize L() are as follows:
1 Write the log likelihood:
L() = log f
n
(x|,
2
)
=
n
2
log(2)
n
2
log
2

1
2
2
n

i =1
(x
i
)
2
2 Assume
2
is xed and nd (
2
) that maximizes L().
3 Find the value

2
of
2
that maximizes L(

), where

= ( (
2
),
2
)
4 Set the MLE estimator as the random vector: ( (

2
),

2
)
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 11 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance (Cont.)
Second Step: The value of (
2
) was found when the likelihood was
maximized for
2
xed or known. The solution was (
2
) = x
n
Third Step: We set

= ( x
n
,
2
) and maximize
L(

) =
n
2
log(2)
n
2
log
2

1
2
2
n

i =1
(x
i
x
n
)
2
We take the derivative with respect to
2
, makes it equal to 0 and
solves for
2
. The derivative is:
d
d
2
L(

) =
n
2
1

2
+
1
2(
2
)
2
n

i =1
(x
i
x
n
)
2
.
Setting this derivative to zero and solving for
2
we get:

2
=
1
n
n

i =1
(x
i
x
n
)
2
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 12 / 19
Maximum Likelihood Estimators for multi-parametric models
Multi-parametric models
Example: Normal Distribution with unknown mean and variance (Cont.)
Fourth Step: The maximum likelihood estimator is set as ( (

2
),

2
):

= ( ,

2
) = (

X
n
,
1
n
n

i =1
(X
i


X
n
)
2
)
Notes:
Second derivative of L(

) is negative at the value of


2
found.
Therefore a maximum has been reached.
The rst coordinate of the MLE is the sample mean.
The second coordinate is the sample variance
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 13 / 19
Sucient Statistics Sucient Statistics
Denition of a Sucient Statistic
Example: Lifetimes of electronic components
In this example the MLE of the mean lifetime of electronic components
was found. Note that in this estimation procedure we made use of the
data through the value of the statistics: X
1
+ X
2
+ X
3
.
This is an example of a sucient statistics. This concept was introduced
by Fisher in 1922. In this example there is a function
T = r (X
1
, X
2
, . . . , X
n
) that summarizes all the information in the random
sample. Knowledge of the individual values of the sample are not always
necessary to get a good estimator of . A statistics T having this property
is called a sucient statistic.
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 14 / 19
Sucient Statistics Sucient Statistics
Denition of a Sucient Statistic
Sucient statistics
Let X
1
, X
2
, . . . , X
n
be a random sample from a distribution indexed by
parameter . Let T be a statistics. Suppose that for every and every
possible value t of T, the conditional distribution of X
1
, X
2
, . . . , X
n
given
T = t and depends on t but not on . This means that the conditional
distribution of X
1
, X
2
, . . . , X
n
given T = t and is the same for all values
of . We say hat T is a sucient statistic for the parameter
Suppose you know T and can simulate random variables X

1
, X

2
, . . . , X

n
such that for every , the joint distribution of X

1
, X

2
, . . . , X

n
is the same as
the joint distribution of X
1
, X
2
, . . . , X
n
. The statistics T is sucient in the
sense that one can use X

1
, X

2
, . . . , X

n
in the same way as X
1
, X
2
, . . . , X
n
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 15 / 19
Sucient Statistics Factorization criterion
Factorization criterion
How to nd a sucient statistic?
This method was developed by R.A. Fisher in 1922, J. Neyman in 1935
and P.R. Halmos and L.J. Savage in 1949.
Theorem: Factorization criterion
Let X
1
, X
2
, . . . , X
n
form a random variable from either a continuous or a
discrete distribution for which the pdf or the pf is f (x|). The value of is
unknown and belongs to a parameter space . A statistic
T = r (X
1
, X
2
, . . . , X
n
) is a sucient statistic if and only if the joint pdf or
pf f
n
(x|) can be factored as: f
n
(x|) = u(x)v[r (x, )] for all values of
x = (x
1
, x
2
, . . . , x
n
)
n
and
Functions u and v are non-negatives. u depends on x but it does not
depend on . v depends on and depends on x only through the value of
the statistics r (x).
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 16 / 19
Sucient Statistics Factorization criterion
Factorization criterion
Example 1: Sampling from a Poisson distribution
Suppose that X = (X
1
, X
2
, . . . , X
n
) is a random sample from a Poisson
distribution for which the true value of the mean is unknown. ( > 0). Let
r (x) =

n
i =1
x
i
. We can prove that T = r (X) =

n
i =1
X
i
is a sucient
statistic for . The joint pf f
n
(x|) is:
f
n
(x|) =
n

i =1
e

x
i
x
i
!
= (
n

i =1
1
x
i
!
)e
n

r (x)
Let u(x) =

n
i =1
1
x
i
!
and v(t, ) = e
n

t
. This means that f
n
(x|) has
been factored as in the theorem. It follows that T =

n
i =1
X
i
is a
sucient statistic for .
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 17 / 19
Sucient Statistics Factorization criterion
Factorization criterion
Example 2: Sampling from a continuous distribution
Suppose that X = (X
1
, X
2
, . . . , X
n
) is a random sample from a continuous
distribution:
f (x|) =

x
1
if 0 < x < 1
0 otherwise
We will show that T =

n
i =1
X
i
is a sucient statistic for . For
0 < x
i
< 1(i = 1, . . . , n) the joint pdf is:
f
n
(x|) =
n
(
n

i =1
x
i
)
1
=
n
[r (x)]
1
. If at leat one value of x
i
is outside the interval 0 < x
i
< 1, f
n
(x|) = 0.
If we set u(x) = 1 and v(t, ) =
n
t
1
, f
n
(x|) is factored as in the
Theorem and T =

n
i =1
X
i
is a sucient statistics for .
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 18 / 19
Sucient Statistics Factorization criterion
Thanks for your attention ...
Winter 2012. Session 1 (Class 3) AMS-133/206 Jan 17, 2012 19 / 19

Vous aimerez peut-être aussi