Vous êtes sur la page 1sur 5

C-R lower bound, Fisher Information

February 26, 2009


Some identities related to the log likelihood functions
.
Since we always have, (for a family of density functions f(x, ))
_
f(x, )dx = 1
(true for all ) therefore taking derivative on both side wrt and assume we
can taking the derivative inside the integral (this requires some conditions),
we have
_
f(x, )

dx = 0
multiply and devide inside the integral by f(x, ), we can rewrite the equality
as
_

f(x, )
f(x, )
f(x, )dx = 0
which is
E
_

log f(X, )
_
= 0 (1)
Taking one more derivative (and exchange the integral and derivative
again) we have
_
_

2

2
log f(x, ) f(x, ) + [

log f(x, )]

f(x, )
_
dx = 0
1
multiple and devide inside the second integral by f(x, ), that is
E

2

2
log f(X, ) + E
_

log f(X, )
_
2
= 0 (2)
We shall call these Bartlett identities one and two. More derivative will
result more identities.
Cramer-Rao Lower Bound
Now supppose we have an unbiased estimator of , i.e.
E

(X) =
(here x can be a vector) that is
_

(x)f(x, )dx =
Taking derivative wrt on both sides (and assuming we can taking the
derivative insde the integral)
_

(x)

f(x, )dx = 1
(by denition, the estimator

cannot depend on the parameter , it is a
function of data x.) Multiply and devide by f(x, );
_

(x)
log f(x, )

f(x, )dx = 1
Write it as expectation:
E

(X)
log f(X, )

= 1
We know
E
log f(X, )

= 0
(rst identity) therefore the above is also
cov(

(X))(
log f(X, )

)) = 1
2
Recall cov(XY ) = E(X EX)(Y EY ) = E(X)(Y EY ).
Using the inequality [cov(XY )]
2
V arX V arY (or Cauchy-Schwarz
inequality) we have
V ar

(X) V ar
log f(X, )

) 1
Another way to write it is
V ar

(X)
1
V ar
_
log f(X,)

_
(3)
This is called The Cramer-Rao lower bound.
Please note (1) this lower bound is for ALL the unbiased estimators; and
(2) the lower bound is the same as the approximate variance of the MLE.
In other words, the MLE is (at least) approximately the best estimator.
(well, MLE may have a small, nite sample bias, sometimes).
Using the identity, we can rewrite the C-R lower bound as
V ar

(x)
1
E
_

2
log f(X,)

2
_
=
1
E
_
log f(X,)

_
2
Please note the condition of interchange the derivative and integration
will exclude the uniform [0, ] distribution as f(x, ).
3
Fisher Information
One fundamental quantity in statistical inference is Fisher Information. We
will dene Fisher information, two kinds.
As with MLEs, we concentrate on the log likelihood.
Denition: The expected Fisher Information about a parameter is the
expectation of the minus second derivative of the log likelihood: E{

2
()
2
log lik()}.
Denition: The observed Fisher Information about a parameter is the
minus second derivative of the log likelihood, and then with the parameter
replaced by the MLE.

2
()
2
log lik()|
=(MLE)
.
Example 1
Let Y
1
, . . . , Y
n
N(,
2
) with unknown and
2
known. The loglikeli-
hood is
(n/2) ln(2
2
) (1/2
2
)

(y
i
)
2
Taking derivatives twice with respect to ,

2
log L()

2
=

(1/2
2
)

2(y
i
) = n/
2
Here, both the expected and the observed Fisher information are equal to
n/
2
. Because there is no random quantity to take expectation of, and there
is no unknown parameter, , to be replaced of.
Example 2
Let Y
1
, . . . , Y
n
Poisson(). The loglikelihood is
n +

y
i
ln + ln(1/

i
(y
i
!))
Taking derivatives twice with respect to we nd

2
log L() =

_
n +

y
i
/
_
=

y
i
/
2
The

mle
=

Y . The observed Fisher information here is

y
i
/
2
, with
replaced by

mle
=

Y . So it simplify to n
2
/

y
i
.
4
The expected Fisher information is
E

y
i
/
2
= n/
2
= n/ = I
fisher
()
The observed Fisher information is something you can calculate its value
given the data. It depends on the data. so it is random.
The observed Fisher information is the theoretical value, often involving
unknown parameters. But it is non-random.
5

Vous aimerez peut-être aussi