Vous êtes sur la page 1sur 9

Parametric Likelihood - I

Usual assumptions:
Empirical Likelihood: dim Θ < ∞, (1)
A Review θ1 , θ2 ∈ Θ, θ1 = θ2 ⇒ not f (x|θ1 ) = f (x|θ2 ) a.e. (2)
l(θ|x) = log f (x|θ) ∈ C 3 (Θ);
 3 
Stanislav Kolenikov ∂



skolenik@unc.edu  ∂θ3 l(θ|x) < g(x), Eθ |g(x)| < ∞ ∀θ (3)

117 New West, Cameron Ave, ∂


s(θ) = l(θ|x) ⇒ (4)
∂θ
University of North Carolina,  

Chapel Hill, NC 27599-3260, US Es(θ)s (θ) = E − s(θ) = I(θ),
T
I(θ) > 0 (5)
∂θ

Parametric Likelihood - II Empirical distribution function


ML estimate: Suppose X1 , . . . , Xn ∼ F for some unknown F . We then at least

know that supp F ⊇ X1 , . . . , Xn . What is the ML estimate of
θ̂M L = arg max l(θ|x), or (local max) s(θ̂M L ) = 0 (6)
θ∈Ω the distribution function in the class of functions that satisfy this
property? I.e.
Properties: consistent, asymptotically normal, asymptotically
efficient, invariant
Wilks (1938) theorem: in testing H0 : θ = θ0 ∈ int Θ vs.

n
max over probability measures
L(F ) = F {xi } → (8)
H1 : θ = θ0 , dim Θ = r, Taylor expansion near θ0 together with i=1 with support on [X(1) , X(n) ]
consistency give that
  The answer is the CDF:
W = −2 l(θ0 |x) − sup l(θ|x) → χ2 (r)
d
(7)
θ∈Θ\{θ0 } 1
F̂M L = δxi ≡ Fn (9)
n i
 
Confidence intervals are of the form θ ∈ Θ|l(θ|x) − l(θ̂|x) > c .
Functionals of distributions Empirical likelihood - I
How do we test hypotheses when we have no parameters? (Fn is Owen (1988): for a distribution function F , define the empirical
parameter-free). likelihood ratio
Quite often, the parameter of the distribution is also its moment R(F ) = L(F )/L(Fn ) (10)
(θ = EX) or a quantile (Pθ (X < θ) = 1/2).
Idea: use functionals of distributions: θ = T (F ). Then the likelihood ratio test statistic for H0 : T (F0 ) = t is

Examples: W = −2 sup log R(F )|T (F ) = t, supp F ⊂ [X(1) , X(n) ] (11)

+∞
mean: θ = EX = x dF (x)
−∞ Also, confidence regions are
a
+∞ 
median: θ = a : − dF (x) + dF (x) = 0 R(c) = T (F )|R(F ) ≥ c (12)
−∞ a

Empirical likelihood - II Simple example - I


How do “good” distribution functions under H0 look like? It Sample: 1, 7, 9

should be the case that candidate F  Fn . Then F = i wi δXi for
Mean: X̄ = (1 + 7 + 9)/3 = 17/3 = 5.667
some non-negative wi ’s, i wi = 1;

n Testing θ = EX = 5: w1 = 0.4248, w2 = 0.3009, w3 = 0.2743,
the likelihood function is L̃(F, w) = wi , W = 0.1096.
i=1

n
the likelihood ratio becomes R̃(F, w) = nwi . Testing θ = EX = 2: w1 = 0.8544, w2 = 0.0823, w3 = 0.0633,
i=1
W = 4.2383.
What is the distribution of the empirical LR statistic? Should W
be compared to χ2 (1)?
Simple example - II Asymptotics - I

12 Owen (1988) shows the analogue of Wilks theorem for convergence


log LR Empirical likelihood ratio of the empirical likelihood ratio for the population mean.
10 of the expected value

Theorem 1 Let X1 , X2 , . . . be independent random variables with


for the sample (1,7,9)

8 nondegenerate distribution function F0 s.t. |x|3 dF0 < ∞. For
positive c < 1 let
6

Fc,n = F |R(F ) ≥ c, F  Fn , (13)
4

and define XU,n = supFc,n x dF and XL,n = inf Fc,n x dF . Then
2 as n → ∞,
 
0
1 2 3 4 5 6 7 8 9
Prob XL,n ≤ E[X] ≤ XU,n → Prob χ2 (1) ≤ −2 log c . (14)
θ

Note: # of ancillary parameters → ∞.


Empirical likelihood ratio in this simple example

Outline of the proof - I Outline of the proof - II



1. XU,n = sup wi Xi , XL,n = inf wi Xi , wi ≥ 0, wi = 1, 4. Taylor expansion for g −1 (·):

nwi ≥ c. λ0 = g −1 (0) = g −1 (X̄) + (0 − X̄)(g −1 ) (ξ) = −X̄(g −1 ) (ξ),
|ξ| ≤ |X̄|, η = g −1 (ξ), |η| ≤ |λ0 |, and then λ0 = r0 X̄/S 2 ,
2. Assume E[X] = 0 and define
p
r0 = −S 2 /g  (η) → 1.
G = log nwi + γ(1 − wi ) + nλ(0 − wi Xi ). Then taking
the derivatives gives wi = 1/(γ + nλXi ) and further γ = n, 5. Taylor expansion for the likelihood ratio:
 
which implies that the log likelihood ratio in question is −2 log R0 = 2 log(1 + λ0 Xi ) = 2 λ0 Xi − (λ0 Xi )2 + ηi =

log R0 = − log(1 + λ0 Xi ) where λ0 is the root of 2nX̄ 2 r0 /S 2 −nS 2 (X̄ 2 r0 /S 2 )2 + ηi = (2r0 −r02 )nX̄ 2 /S 2 + ηi ,
−1 −1
0 = n−1 Xi /(1 + λXi ) ≡ g(λ), λ0 ∈ (−X(n) , −X(1) ). | etai | ≤ |λ0 |3 |Xi |3 = Op (n−1/2 ), 2r0 − r02 = 1 + op (1),
d
3. λ0 = Op (n−q ) for some q < 1/2, as and by the CLT, nX̄ 2 /S 2 → χ2 (1).
n1/2 g(n−q ) ≤ n1/2 X̄ − n1/2 S 2 /nq + n1/3 (where

S 2 = n−1 Xi2 ) since the third moment is finite, and thus

Prob max |Xi | > n1/3 i.o. = 0.
M-estimates Asymptotics - II
Theorem 1 can be extended to the M-estimates τ = T (F ) that Owen (1990) gives an extended version of Theorem 1:
solve, for some “regular” ψ, the equation
Theorem 2 If X1 , X2 , . . . ∼ i.i.d. in Rp , µ = E[X1 ] is finite,
 
ψ(X, τ ) F (dX) = 0 (15) rk Cov X1 = q, Cr,n = X dF |R(F ) ≥ r, F  Fn . Then

lim Prob [µ ∈ Cr,n ] = Prob [χ2 (q) ≤ −2 ln r] (16)


n→∞
Examples.
Moreover, if EX4 < ∞, then the rate of convergence is O(n−1/2 ).
1. Mean: ψ(x, t) = x − t
 The solution is then given by the dual convex problem:
 1, x≤t
2. Quantiles: ψ(x, t) = − ln(1 + λ (Xi − µ)) → min{λ}
 −q/(1 − q), x>t
(17)
Xi − µ
 0= (18)

 c, x−t≥c 1 + λ (Xi − µ)
 i
3. Huber’s robust location: ψ(x, t) = x − t, |x − t| ≤ c 1 1

 Fµ {Xi } = (19)
 
n 1 + λ (xi − µ)
−c, x − t ≤ −c

Asymptotics - III Asymptotics - IV


DiCiccio, Hall and Romano (1989) expand the latter expression in By getting a similar expansion for the parametric likelihoods,
λ (Xi − µ) to show that λ = X̄ − µ + . where . = O(n−1 log log n) DiCiccio, Hall and Romano (1989) conclude that the two do not
a.s. and . = O(n−1 ) in probability if EX4 < ∞. Furthermore, necessarily agree even to the first order, i.e. Op (n−1/2 ). They
introducing exemplify the point with a double exponential distribution (where
the mean is not the location parameter, however). The confidence
g(ν) = ν j ν k E[X j X k X] and ∆ = n−1 Xi XiT − I,
region coverage, however, have an error of order O(n−1/2 ), i.e. the
j,k i
empirical likelihood provides right nominal coverage.
they show that if EX6 < ∞,

λ = X̄ − µ + g(X̄ − µ) − ∆(X̄ − µ) + ξ, ξ = Op (n−3/2 ).

Finally, they expand the empirical likelihood ratio


4  3
lE (µ) = nλT (I + ∆)λ − (X̄ − µ)T Xi + Rn , Rn = Op (n−1 ).
3 i
Bivariate example - I Bivariate example - II
Owen (1990) illustrates the multivariate extension with the genetic
experiment on duck plumage data (see references in the article).
Fig. 1 shows the empirical likelihood contours that used 20/9 times
the F2,9 distribution (small sample correction instead of the
asymptotic χ2 (2)). Fig. 2 shows parametric likelihood ratio
contours for Hotelling’s T 2 statistic based on F2,9 distribution.
Fig. 3 (not shown) attempts to construct the bootstrap based
confidence regions.

Bivariate example - III Regression - I

Y = Xβ0 + ., E .|X = 0, Var .|X = σi2 (20)

Stochastic regressors case is easier! Normal equations / estimating


equations / moment conditions (econometrics):

Zi = Xi (Yi − Xi β), β = β0 ⇒ EZi = 0 (21)

Empirical likelihood ratio test: whether EZi = 0.



R(β) = max ln nwi |wi ≥ 0, wi = 1, wi xi (yi − xi β)
(22)

Can also build CIs for a subset of r regressors — compare R(β) to


χ2 (r). Owen (1990): compare to scaled F distribution?
Regression - II Regression - III
The case with fixed regressors is more difficult. Misspecification: E(Zi ) = µi , Var Zi = Vi is of full rank (1D case:

Vi = σi2 ). We are testing whether µ̄ = 1/n µi takes a given value
Theorem 3
 µ0 . The empirical likelihood test refers
µ4 (x) = (Y − xβ0 )4 dFx , n−2 xi µ4 (xi ) → 0,
n(Z̄ − µ0 )2
i
 1/n (Zi − µ0 )2
Prob conv {xi |Yi − xi β0 > 0} ∩ conv {xi |Yi − xi β0 < 0} =
 ∅ → 1,
to a χ2 distribution. The variance estimate in the denominator is
a < σ 2 (xi ) < bxi α , a, b > 0, α ≥ 0,
biased upward due to the model misspecification:
1
a < min λ{X  X/n}, xi 2+α < b
n E1/n (Zi − µ0 )2 = 1/n σi2 + 1/n (µi − µ0 )2
p
⇒ −2 log R(β0 ) → χ2 (p) (23) i

Thus, Owen (1991) concludes, confidence sets for β0 will not have
To the order Op (N −1/2 ), the empirical likelihood is equivalent to consistent coverage, but will be conservative.
Huber / White / 1st order linearization heteroskedasticity
consistent covariance matrix.

Small sample properties - I Edgeworth and Cornish-Fisher


Asymptotically, everything is standard normal, or χ2 — what is
expansions - I
the scope for empirical likelihood, then?
Edgeworth expansion: the one for distribution functions:
Hint: bias, skewness, heavy tails. 
Prob n1/2 (θ̂ − θ0 )/σ ≤ x = Φ(x) + n−1/2 p1 (x)φ(x)+
Owen (1990) argues that the empirical likelihood corrects the
skewness as compared to Student’s t, and notes that Johnson’s t
+n−1 p2 (x)φ(x) + . . . + n−j/2 pj (x)φ(x) + . . . (24)
(Johnson 1978) corrects for both skewness and bias.
Cornish-Fisher expansion: the one for quantiles:

Prob n1/2 (θ̂ − θ0 )/σ ≤ x = α ⇒

uα = zα + n−1/2 p̃1 (zα ) + n−1 p̃2 (zα ) + . . . + n−j/2 p̃j (zα ) + . . .


(25)
Edgeworth expansion - II Edgeworth expansion - III
Sums of independent random variables: if X1 , X2 , . . . are i.i.d., then In particular,
Sn = n1/2 (X̄ − E[X])/ Var X is asymptotically standard normal. If 1 1 1
r1 (u) = κ3 u3 , r2 (u) = κ4 u4 + κ23 u6 . (28)
  6 24 72
    
X − E[X] As χn (t) = eitx dFSn (x) = eitx d Φ(x) + n−j/2 Rj (x) , the
χ(t) = E exp it = exp  κj (it)j /j! (26)
(Var X)1/2 2
j functions Rj (·) are the solutions of eitx dRj (x) = rj (it)e−t /2 ,
which can be shown to be related to the derivatives of the normal
is the characteristic function of the normalized distribution, where
CDF:
κj ’s are its cumulants, then  d 
 n  t2 Rj (x) = rj − Φ(x) (29)
(it)3 dx
χn (t) ≡ χSn (t) = χ(t/n1/2 ) = exp − + n−1/2 κ3 + ... =
2 3! Then the first two terms of the Edgeworth expansion for the
−t2 /2
 −1/2 −1

=e 1+n r1 (it) + n r2 (it) + . . . , (27) sample mean are,
1
where rj is a polynomial of degree 3j depending on the cumulants R1 (x) = − κ3 (x2 − 1)φ(x), (30)
 6 
up to order j + 2.
1 1
R2 (x) = −x κ4 (x2 − 3) + κ23 (x4 − 10x2 + 15) (31)
24 72

Cornish-Fisher expansion - IV Further Corrections - I


Owen (1990) expands the likelihood ratio (of the mean parameter): Empirical likelihood per se: corrects skewness; central CIs have
coverage errors of O(n−1 ), one-sided CIs, O(n−1/2 ).
−2 log R = t2µ + 2n−1/2 t3µ γµ /3 + o(n−1/2 ) (32)
Hall (1990):
and concludes that the Cornish-Fisher expansion
 Bartlett correction reduces the central coverage errors to O(n−2 );
CF ( −2 log R) = Z1 − n−1/2 γ/6 − n−1/2 AZ1 Z2 + o(n−1/2 ),
location scale modification reduces the one-sided coverage errors
Z1 , Z2 ∼ i.i.d. N (0, 1) to O(n−1 );
indicates bias in the empirical likelihood ratio. The bias disappears location adjustment of order O(n−1 ) makes the empirical
for the two-sided CIs though. C.f. expansion for the Student’s t: likelihood confidence regions second order correct. That is,
without the adjustment, they are of the first-order correct size,
CF (t) = Z1 − n−1/2 γ/6 − n−1/2 γZ12 /3 − n−1/2 AZ1 Z2 + o(n−1/2 )
shape and orientation.
(33)
Corrections - II Corrections - III
If θ̂ is an estimator of the parameter θ0 with Q = lim Cov[n1/2 θ̂], (i) µ0 = E(X) (assume 0), Cov X = Σ0 (assume I),
where Q̂ is an estimate of the asymptotic covariance matrix θ : Rr → Rs , so θ = θ(µ)
(bootstrap, jackknife, unknowns replaced by the sample estimates), (ii) Θ = ∂θ/∂µ|µ=µ0 , Q = ΘΣ0 ΘT , Q̂ = Θ̂Σ̂0 Θ̂T , R = ΘT Q−1 Θ,
then the density of η̂0 = Q̂−1/2 (θ̂ − θ0 ) can be approximated by 
1 − 1 uj kl jkl
N (0, I). Hall (1990) shows that the empirical likelihood regions are (iii) α = E(X X X ), ψ =
jkl j k l u
− 2 (Q 2 Θ) R α +
rather based on ξˆ0 = (Q1/2 Q̂−1 Q1/2 )1/2 Q−1/2 (θ̂ − θ0 ). They agree
j,k

1
 −1 u − 12 T −1
to the second order to the pseudo-likelihood contours based on + 2 Q (θjk R − θjj ) − (Q Θ) (Θ Q θjk )
2 jk uj k

ξˆ0 + n−1 ψ for a certain fixed ψ.


(iv) ξˆ = ξˆ0 + n−1 ψ ⇒ g(y) = φ(y)(1 + n−1/2 q(y) + O(n−1 )) is the
ˆ q(y) is a cubic polynomial in y.
density of ξ;
(v) T (x) = nxT x − n2q(x) + s ln(2π/n) + Op (n−1 ) ⇒
ˆ
lθ (θ(ν)) = T (ξ(ν)) + s ln(n/2π) + Op (n−1 )

Corrections - IV Conclusion
Mentioned earlier: first order correct coverage of the CIs: • Empirical likelihood is a non-parametric method of inference
on distribution functionals.
Prob[θ0 ∈ R(x)] = Prob[χ2 (s) ≤ x] + O(n−1 ), (34)
• It is applicable for many settings including means,
Bartlett correction: define b by M-estimates, moments, and smooth functions of moments
E[lθ (µ0 )] = s(1 + n−1 b) + O(n−2 ). (35) (correlation, regression).

√ • Empirical likelihood improves upon the normal approximation


If b̂ is a n-consistent estimator of b (say obtained by replacing
by adjusting for skewness / third order properties of the
unknowns by their sample analogues), then the Bartlett-corrected
population distribution.
confidence region is
 • Further improvement of coverage is through location and / or
RB (x) = θ(ν)|lθ (ν) ≤ x(1 + n−1 b̂) (36) Bartlett corrections.
The result is the second order correct coverage

Prob[θ0 ∈ RB (x)] = Prob[χ2 (s) ≤ x] + O(n−2 ). (37)


References References - II
DiCiccio, T. J., P. Hall, and J. P. Romano (1989). Comparison of Parametric and Cox, D. R., and D. V. Hinkley (1974). Theoretical Statistics. Chapman and Hall,
Empirical Likelihood Functions. Biometrika, 76 (3), 465–476. London.
Hall, P. (1990). Pseudo-Likelihood Theory for Empirical Likelihood. Ann. Stat., 18 Hall, P. (1992) The Bootstrap and Edgeworth Expansion. Springer-Verlag, NY.
(1), 121–140.
van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press, NY.
Johnson, N. J. (1978). Modified t tests and confidence intervals for asymmetrical
populations. JASA, 73, 536–544.

Owen, A. (1988). Empirical Likelihood Ratio Confidence Intervals for a Single Func-
tional. Biometrika, 75 (2), 237–249.

Owen, A. (1990). Empirical Likelihood Ratio Confidence Regions. Ann. Stat., 18 (1),
90–120.

Owen, A. (1991). Empirical Likelihood for Linear Models. Ann. Stat., 19 (4), 1725–
1747.

32 33