Vous êtes sur la page 1sur 9

# Parametric Likelihood - I

Usual assumptions:
Empirical Likelihood: dim Θ < ∞, (1)
A Review θ1 , θ2 ∈ Θ, θ1 = θ2 ⇒ not f (x|θ1 ) = f (x|θ2 ) a.e. (2)
l(θ|x) = log f (x|θ) ∈ C 3 (Θ);
 3 
Stanislav Kolenikov ∂



skolenik@unc.edu  ∂θ3 l(θ|x) < g(x), Eθ |g(x)| < ∞ ∀θ (3)

## 117 New West, Cameron Ave, ∂

s(θ) = l(θ|x) ⇒ (4)
∂θ
University of North Carolina,  

Chapel Hill, NC 27599-3260, US Es(θ)s (θ) = E − s(θ) = I(θ),
T
I(θ) > 0 (5)
∂θ

## Parametric Likelihood - II Empirical distribution function

ML estimate: Suppose X1 , . . . , Xn ∼ F for some unknown F . We then at least

know that supp F ⊇ X1 , . . . , Xn . What is the ML estimate of
θ̂M L = arg max l(θ|x), or (local max) s(θ̂M L ) = 0 (6)
θ∈Ω the distribution function in the class of functions that satisfy this
property? I.e.
Properties: consistent, asymptotically normal, asymptotically
eﬃcient, invariant
Wilks (1938) theorem: in testing H0 : θ = θ0 ∈ int Θ vs.

n
max over probability measures
L(F ) = F {xi } → (8)
H1 : θ = θ0 , dim Θ = r, Taylor expansion near θ0 together with i=1 with support on [X(1) , X(n) ]
consistency give that
  The answer is the CDF:
W = −2 l(θ0 |x) − sup l(θ|x) → χ2 (r)
d
(7)
θ∈Θ\{θ0 } 1
F̂M L = δxi ≡ Fn (9)
n i
 
Conﬁdence intervals are of the form θ ∈ Θ|l(θ|x) − l(θ̂|x) > c .
Functionals of distributions Empirical likelihood - I
How do we test hypotheses when we have no parameters? (Fn is Owen (1988): for a distribution function F , deﬁne the empirical
parameter-free). likelihood ratio
Quite often, the parameter of the distribution is also its moment R(F ) = L(F )/L(Fn ) (10)
(θ = EX) or a quantile (Pθ (X < θ) = 1/2).
Idea: use functionals of distributions: θ = T (F ). Then the likelihood ratio test statistic for H0 : T (F0 ) = t is

Examples: W = −2 sup log R(F )|T (F ) = t, supp F ⊂ [X(1) , X(n) ] (11)

+∞
mean: θ = EX = x dF (x)
−∞ Also, conﬁdence regions are
a
+∞ 
median: θ = a : − dF (x) + dF (x) = 0 R(c) = T (F )|R(F ) ≥ c (12)
−∞ a

## Empirical likelihood - II Simple example - I

How do “good” distribution functions under H0 look like? It Sample: 1, 7, 9

should be the case that candidate F  Fn . Then F = i wi δXi for
Mean: X̄ = (1 + 7 + 9)/3 = 17/3 = 5.667
some non-negative wi ’s, i wi = 1;

n Testing θ = EX = 5: w1 = 0.4248, w2 = 0.3009, w3 = 0.2743,
the likelihood function is L̃(F, w) = wi , W = 0.1096.
i=1

n
the likelihood ratio becomes R̃(F, w) = nwi . Testing θ = EX = 2: w1 = 0.8544, w2 = 0.0823, w3 = 0.0633,
i=1
W = 4.2383.
What is the distribution of the empirical LR statistic? Should W
be compared to χ2 (1)?
Simple example - II Asymptotics - I

## 12 Owen (1988) shows the analogue of Wilks theorem for convergence

log LR Empirical likelihood ratio of the empirical likelihood ratio for the population mean.
10 of the expected value

## Theorem 1 Let X1 , X2 , . . . be independent random variables with

for the sample (1,7,9)

8 nondegenerate distribution function F0 s.t. |x|3 dF0 < ∞. For
positive c < 1 let
6

Fc,n = F |R(F ) ≥ c, F  Fn , (13)
4

and deﬁne XU,n = supFc,n x dF and XL,n = inf Fc,n x dF . Then
2 as n → ∞,
 
0
1 2 3 4 5 6 7 8 9
Prob XL,n ≤ E[X] ≤ XU,n → Prob χ2 (1) ≤ −2 log c . (14)
θ

## Note: # of ancillary parameters → ∞.

Empirical likelihood ratio in this simple example

## Outline of the proof - I Outline of the proof - II

1. XU,n = sup wi Xi , XL,n = inf wi Xi , wi ≥ 0, wi = 1, 4. Taylor expansion for g −1 (·):

nwi ≥ c. λ0 = g −1 (0) = g −1 (X̄) + (0 − X̄)(g −1 ) (ξ) = −X̄(g −1 ) (ξ),
|ξ| ≤ |X̄|, η = g −1 (ξ), |η| ≤ |λ0 |, and then λ0 = r0 X̄/S 2 ,
2. Assume E[X] = 0 and deﬁne
p
r0 = −S 2 /g  (η) → 1.
G = log nwi + γ(1 − wi ) + nλ(0 − wi Xi ). Then taking
the derivatives gives wi = 1/(γ + nλXi ) and further γ = n, 5. Taylor expansion for the likelihood ratio:
 
which implies that the log likelihood ratio in question is −2 log R0 = 2 log(1 + λ0 Xi ) = 2 λ0 Xi − (λ0 Xi )2 + ηi =

log R0 = − log(1 + λ0 Xi ) where λ0 is the root of 2nX̄ 2 r0 /S 2 −nS 2 (X̄ 2 r0 /S 2 )2 + ηi = (2r0 −r02 )nX̄ 2 /S 2 + ηi ,
−1 −1
0 = n−1 Xi /(1 + λXi ) ≡ g(λ), λ0 ∈ (−X(n) , −X(1) ). | etai | ≤ |λ0 |3 |Xi |3 = Op (n−1/2 ), 2r0 − r02 = 1 + op (1),
d
3. λ0 = Op (n−q ) for some q < 1/2, as and by the CLT, nX̄ 2 /S 2 → χ2 (1).
n1/2 g(n−q ) ≤ n1/2 X̄ − n1/2 S 2 /nq + n1/3 (where

S 2 = n−1 Xi2 ) since the third moment is ﬁnite, and thus

Prob max |Xi | > n1/3 i.o. = 0.
M-estimates Asymptotics - II
Theorem 1 can be extended to the M-estimates τ = T (F ) that Owen (1990) gives an extended version of Theorem 1:
solve, for some “regular” ψ, the equation
Theorem 2 If X1 , X2 , . . . ∼ i.i.d. in Rp , µ = E[X1 ] is ﬁnite,
 
ψ(X, τ ) F (dX) = 0 (15) rk Cov X1 = q, Cr,n = X dF |R(F ) ≥ r, F  Fn . Then

## lim Prob [µ ∈ Cr,n ] = Prob [χ2 (q) ≤ −2 ln r] (16)

n→∞
Examples.
Moreover, if EX4 < ∞, then the rate of convergence is O(n−1/2 ).
1. Mean: ψ(x, t) = x − t
 The solution is then given by the dual convex problem:
 1, x≤t
2. Quantiles: ψ(x, t) = − ln(1 + λ (Xi − µ)) → min{λ}
 −q/(1 − q), x>t
(17)
Xi − µ
 0= (18)

 c, x−t≥c 1 + λ (Xi − µ)
 i
3. Huber’s robust location: ψ(x, t) = x − t, |x − t| ≤ c 1 1

 Fµ {Xi } = (19)
 
n 1 + λ (xi − µ)
−c, x − t ≤ −c

## Asymptotics - III Asymptotics - IV

DiCiccio, Hall and Romano (1989) expand the latter expression in By getting a similar expansion for the parametric likelihoods,
λ (Xi − µ) to show that λ = X̄ − µ + . where . = O(n−1 log log n) DiCiccio, Hall and Romano (1989) conclude that the two do not
a.s. and . = O(n−1 ) in probability if EX4 < ∞. Furthermore, necessarily agree even to the ﬁrst order, i.e. Op (n−1/2 ). They
introducing exemplify the point with a double exponential distribution (where
the mean is not the location parameter, however). The conﬁdence
g(ν) = ν j ν k E[X j X k X] and ∆ = n−1 Xi XiT − I,
region coverage, however, have an error of order O(n−1/2 ), i.e. the
j,k i
empirical likelihood provides right nominal coverage.
they show that if EX6 < ∞,

## Finally, they expand the empirical likelihood ratio

4  3
lE (µ) = nλT (I + ∆)λ − (X̄ − µ)T Xi + Rn , Rn = Op (n−1 ).
3 i
Bivariate example - I Bivariate example - II
Owen (1990) illustrates the multivariate extension with the genetic
experiment on duck plumage data (see references in the article).
Fig. 1 shows the empirical likelihood contours that used 20/9 times
the F2,9 distribution (small sample correction instead of the
asymptotic χ2 (2)). Fig. 2 shows parametric likelihood ratio
contours for Hotelling’s T 2 statistic based on F2,9 distribution.
Fig. 3 (not shown) attempts to construct the bootstrap based
conﬁdence regions.

## Stochastic regressors case is easier! Normal equations / estimating

equations / moment conditions (econometrics):

## Empirical likelihood ratio test: whether EZi = 0.


R(β) = max ln nwi |wi ≥ 0, wi = 1, wi xi (yi − xi β)
(22)

## Can also build CIs for a subset of r regressors — compare R(β) to

χ2 (r). Owen (1990): compare to scaled F distribution?
Regression - II Regression - III
The case with ﬁxed regressors is more diﬃcult. Misspeciﬁcation: E(Zi ) = µi , Var Zi = Vi is of full rank (1D case:

Vi = σi2 ). We are testing whether µ̄ = 1/n µi takes a given value
Theorem 3
 µ0 . The empirical likelihood test refers
µ4 (x) = (Y − xβ0 )4 dFx , n−2 xi µ4 (xi ) → 0,
n(Z̄ − µ0 )2
i
 1/n (Zi − µ0 )2
Prob conv {xi |Yi − xi β0 > 0} ∩ conv {xi |Yi − xi β0 < 0} =
 ∅ → 1,
to a χ2 distribution. The variance estimate in the denominator is
a < σ 2 (xi ) < bxi α , a, b > 0, α ≥ 0,
biased upward due to the model misspeciﬁcation:
1
a < min λ{X  X/n}, xi 2+α < b
n E1/n (Zi − µ0 )2 = 1/n σi2 + 1/n (µi − µ0 )2
p
⇒ −2 log R(β0 ) → χ2 (p) (23) i

Thus, Owen (1991) concludes, conﬁdence sets for β0 will not have
To the order Op (N −1/2 ), the empirical likelihood is equivalent to consistent coverage, but will be conservative.
Huber / White / 1st order linearization heteroskedasticity
consistent covariance matrix.

## Small sample properties - I Edgeworth and Cornish-Fisher

Asymptotically, everything is standard normal, or χ2 — what is
expansions - I
the scope for empirical likelihood, then?
Edgeworth expansion: the one for distribution functions:
Hint: bias, skewness, heavy tails. 
Prob n1/2 (θ̂ − θ0 )/σ ≤ x = Φ(x) + n−1/2 p1 (x)φ(x)+
Owen (1990) argues that the empirical likelihood corrects the
skewness as compared to Student’s t, and notes that Johnson’s t
+n−1 p2 (x)φ(x) + . . . + n−j/2 pj (x)φ(x) + . . . (24)
(Johnson 1978) corrects for both skewness and bias.
Cornish-Fisher expansion: the one for quantiles:

Prob n1/2 (θ̂ − θ0 )/σ ≤ x = α ⇒

## uα = zα + n−1/2 p̃1 (zα ) + n−1 p̃2 (zα ) + . . . + n−j/2 p̃j (zα ) + . . .

(25)
Edgeworth expansion - II Edgeworth expansion - III
Sums of independent random variables: if X1 , X2 , . . . are i.i.d., then In particular,
Sn = n1/2 (X̄ − E[X])/ Var X is asymptotically standard normal. If 1 1 1
r1 (u) = κ3 u3 , r2 (u) = κ4 u4 + κ23 u6 . (28)
  6 24 72
    
X − E[X] As χn (t) = eitx dFSn (x) = eitx d Φ(x) + n−j/2 Rj (x) , the
χ(t) = E exp it = exp  κj (it)j /j! (26)
(Var X)1/2 2
j functions Rj (·) are the solutions of eitx dRj (x) = rj (it)e−t /2 ,
which can be shown to be related to the derivatives of the normal
is the characteristic function of the normalized distribution, where
CDF:
κj ’s are its cumulants, then  d 
 n  t2 Rj (x) = rj − Φ(x) (29)
(it)3 dx
χn (t) ≡ χSn (t) = χ(t/n1/2 ) = exp − + n−1/2 κ3 + ... =
2 3! Then the ﬁrst two terms of the Edgeworth expansion for the
−t2 /2
 −1/2 −1

=e 1+n r1 (it) + n r2 (it) + . . . , (27) sample mean are,
1
where rj is a polynomial of degree 3j depending on the cumulants R1 (x) = − κ3 (x2 − 1)φ(x), (30)
 6 
up to order j + 2.
1 1
R2 (x) = −x κ4 (x2 − 3) + κ23 (x4 − 10x2 + 15) (31)
24 72

## Cornish-Fisher expansion - IV Further Corrections - I

Owen (1990) expands the likelihood ratio (of the mean parameter): Empirical likelihood per se: corrects skewness; central CIs have
coverage errors of O(n−1 ), one-sided CIs, O(n−1/2 ).
−2 log R = t2µ + 2n−1/2 t3µ γµ /3 + o(n−1/2 ) (32)
Hall (1990):
and concludes that the Cornish-Fisher expansion
Bartlett correction reduces the central coverage errors to O(n−2 );
CF ( −2 log R) = Z1 − n−1/2 γ/6 − n−1/2 AZ1 Z2 + o(n−1/2 ),
location scale modiﬁcation reduces the one-sided coverage errors
Z1 , Z2 ∼ i.i.d. N (0, 1) to O(n−1 );
indicates bias in the empirical likelihood ratio. The bias disappears location adjustment of order O(n−1 ) makes the empirical
for the two-sided CIs though. C.f. expansion for the Student’s t: likelihood conﬁdence regions second order correct. That is,
without the adjustment, they are of the ﬁrst-order correct size,
CF (t) = Z1 − n−1/2 γ/6 − n−1/2 γZ12 /3 − n−1/2 AZ1 Z2 + o(n−1/2 )
shape and orientation.
(33)
Corrections - II Corrections - III
If θ̂ is an estimator of the parameter θ0 with Q = lim Cov[n1/2 θ̂], (i) µ0 = E(X) (assume 0), Cov X = Σ0 (assume I),
where Q̂ is an estimate of the asymptotic covariance matrix θ : Rr → Rs , so θ = θ(µ)
(bootstrap, jackknife, unknowns replaced by the sample estimates), (ii) Θ = ∂θ/∂µ|µ=µ0 , Q = ΘΣ0 ΘT , Q̂ = Θ̂Σ̂0 Θ̂T , R = ΘT Q−1 Θ,
then the density of η̂0 = Q̂−1/2 (θ̂ − θ0 ) can be approximated by 
1 − 1 uj kl jkl
N (0, I). Hall (1990) shows that the empirical likelihood regions are (iii) α = E(X X X ), ψ =
jkl j k l u
− 2 (Q 2 Θ) R α +
rather based on ξˆ0 = (Q1/2 Q̂−1 Q1/2 )1/2 Q−1/2 (θ̂ − θ0 ). They agree
j,k

1
 −1 u − 12 T −1
to the second order to the pseudo-likelihood contours based on + 2 Q (θjk R − θjj ) − (Q Θ) (Θ Q θjk )
2 jk uj k

## ξˆ0 + n−1 ψ for a certain ﬁxed ψ.

(iv) ξˆ = ξˆ0 + n−1 ψ ⇒ g(y) = φ(y)(1 + n−1/2 q(y) + O(n−1 )) is the
ˆ q(y) is a cubic polynomial in y.
density of ξ;
(v) T (x) = nxT x − n2q(x) + s ln(2π/n) + Op (n−1 ) ⇒
ˆ
lθ (θ(ν)) = T (ξ(ν)) + s ln(n/2π) + Op (n−1 )

Corrections - IV Conclusion
Mentioned earlier: ﬁrst order correct coverage of the CIs: • Empirical likelihood is a non-parametric method of inference
on distribution functionals.
Prob[θ0 ∈ R(x)] = Prob[χ2 (s) ≤ x] + O(n−1 ), (34)
• It is applicable for many settings including means,
Bartlett correction: deﬁne b by M-estimates, moments, and smooth functions of moments
E[lθ (µ0 )] = s(1 + n−1 b) + O(n−2 ). (35) (correlation, regression).

## √ • Empirical likelihood improves upon the normal approximation

If b̂ is a n-consistent estimator of b (say obtained by replacing
by adjusting for skewness / third order properties of the
unknowns by their sample analogues), then the Bartlett-corrected
population distribution.
conﬁdence region is
 • Further improvement of coverage is through location and / or
RB (x) = θ(ν)|lθ (ν) ≤ x(1 + n−1 b̂) (36) Bartlett corrections.
The result is the second order correct coverage

## Prob[θ0 ∈ RB (x)] = Prob[χ2 (s) ≤ x] + O(n−2 ). (37)

References References - II
DiCiccio, T. J., P. Hall, and J. P. Romano (1989). Comparison of Parametric and Cox, D. R., and D. V. Hinkley (1974). Theoretical Statistics. Chapman and Hall,
Empirical Likelihood Functions. Biometrika, 76 (3), 465–476. London.
Hall, P. (1990). Pseudo-Likelihood Theory for Empirical Likelihood. Ann. Stat., 18 Hall, P. (1992) The Bootstrap and Edgeworth Expansion. Springer-Verlag, NY.
(1), 121–140.
van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press, NY.
Johnson, N. J. (1978). Modiﬁed t tests and conﬁdence intervals for asymmetrical
populations. JASA, 73, 536–544.

Owen, A. (1988). Empirical Likelihood Ratio Conﬁdence Intervals for a Single Func-
tional. Biometrika, 75 (2), 237–249.

Owen, A. (1990). Empirical Likelihood Ratio Conﬁdence Regions. Ann. Stat., 18 (1),
90–120.

Owen, A. (1991). Empirical Likelihood for Linear Models. Ann. Stat., 19 (4), 1725–
1747.

32 33