Académique Documents
Professionnel Documents
Culture Documents
Winter 2011
Note to graders: Please weight each of the 5 problems out of 20 points for a total of 100 points.
where ω1 = {(µ, σ 2 ) : µ ∈ R, σ 2 > 0} and ω0 = {(µ, σ 2 ) : µ = µ0 , σ 2 > 0}. By definition, the maximum in
the numerator is achieved when the MLEs are substituted for µ and σ 2 .
To obtain the denominator, we need to maximize the function
Pn 2
1 1 i=1 (Xi − µ0 )
f (σ 2 ) = lik(µ0 , σ 2 ) = exp − .
(2π)n/2 σ n 2σ 2
Therefore,
n/2
Pn 2
1 n i=1 (Xi − X̄) /2
Pn exp − Pn
(2π)n/2 i=1 (Xi − X̄)
2 2
i=1 (Xi − X̄) /n
Λ= n/2 Pn 2
1 n i=1 (Xi − µ0 ) /2
P n 2
exp − Pn 2
(2π)n/2 i=1 (Xi − µ0 ) i=1 (Xi − µ0 ) /n
Pn n/2
(Xi − µ0 )2
= Pi=1n 2
.
i=1 (Xi − X̄)
This test rejects H0 if Λ is large; that is, if Λ > c for a cutoff point c ≤ 0 that satisfies
where α is the specified significance level. We have to show that this test is equivalent to a t-test. Indeed,
1
n
X
2
(Xi − µ0 )
i=1
> c2/n
Λ > c ⇐⇒ n
X
(Xi − X̄)2
i=1
n
X
2
(Xi − X̄ + X̄ − µ0 )
i=1
> c2/n
⇐⇒
n
X
(Xi − X̄)2
i=1
n
X n
X
2 2
(Xi − X̄) + (X̄ − µ0 )
i=1 i=1 > c2/n
⇐⇒
n
X
(Xi − X̄)2
i=1
n(X̄ − µ0 )2
> c2/n
⇐⇒
1 + n
X
(Xi − X̄)2
i=1
r
X̄ − µ0 > n − 1 (c2/n − 1)
⇐⇒ P
s
n
2 n
i=1 (Xi − X̄)
n−1
q
Taking c0 = n−1 2/n
n (c − 1) to be equal to tn−1,α/2 we have a two-sided t-test.
Then
Prob[χ2n−1,1−α/2 < (n − 1)s2 /σ 2 < χ2n−1,α/2 ] = 1 − α,
so
" #
(n − 1)s2 (n − 1)s2
Prob 2 < σ2 < 2 = 1 − α.
χn−1,α/2 χn−1,1−α/2
(n − 1)s2 (n − 1)s2
Therefore, Lσ2 (α) = 2 and Uσ2 (α) = 2 determine a (1 − α)100% confidence interval
χn−1,α/2 χn−1,1−α/2
for σ 2 . And, clearly,
Prob[Lσ2 (α) > σ 2 ] = α/2 = Prob[Uσ2 (α) < σ 2 ].
2
(b) Since X̄ and s2 are independent, it is tempting to think that
?
Prob µ ∈ [Lµ (α), Uµ (α)], σ 2 ∈ [L2σ (α), Uσ2 (α)] = (1 − α)2 .
NO!! The random variables here are the interval endpoints, not µ and σ 2 , and since s2 appears in both
pairs of endpoints, there is no reason to think that Lµ (α), Uµ (α) and (L2σ (α), Uσ2 (α)) are independent.
The event that “µ ∈ [Lµ (α), Uµ (α)] and σ 2 ∈ [L2σ (α), Uσ2 (α)]” can be represented as a region in the
X̄ − s plane bounded by the following four lines:
q
L1 : s = σ 2 χ2n−1,α/2 /(n − 1)
q
L2 : s = σ 2 χ2n−1,1−α/2 /(n − 1)
tn−1,α/2
L3 : X̄ = µ + √ s
n
tn−1,α/2
L4 : X̄ = µ − √ s
n
This region is sketched in Figure 1. One way to obtain the exact probability of the event is to compute
the double integral of the joint density of X̄ and s (which is the product of the corresponding marginal
densities, using now the fact that X̄ and s are independent) over that region.
Figure 1: Sketch of the region on the X̄ − s plane that corresponds to the event in question 2(b)
At this point, though, a reasonable bound on probability of the event is enough. We will do that by
applying the Bonferroni inequality:
3
since then, as above,
(n − 1)s2 (n − 1)s2
,
b a
1 1
is a 100(1 − α)% CI for σ 2 . The length of this interval is (n − 1)s2 a − b , so we need to find the
choice of a and b that minimizes a1 − 1b , subject to the constraint (*).
To do that, notice that the constraint (*) determines b as a function of a, so we can use implicit
differentiation:
Z c
d d
fn−1 (x) dx = (1 − α),
da a da
where fn−1 is the density function for the chi square distribution with n−1 degrees of freedom. Picking
an arbitrary c > 0, we obtain:
Z c Z c
d d
fn−1 (x) dx + fn−1 (x) dx = 0.
da a da b
Now:
d 1 1 1 1 fn−1 (a)
− =− 2
+ 2 ,
da a b a b fn−1 (b)
To verify that we have actually found a minimum, a second implicit differentiation yields that
0
d2 b fn−1 (a) 0 fn−1 (a)
2
= − fn−1 (b) .
da fn−1 (b) fn−1 (b)
For the usual values of α, we will have that a must be small enough and b must be large enough so
d2 b
that fn−1 is increasing at a and decreasing at b, guaranteeing that da 2 > 0.
Consequently, an interval of minimum length will be found for any values of a, b that satisfy (*) and
(**). It does not seem possible to solve for a, b explicitly, but it is easy to find numerically the value
of ξ ∈ [0, α) such that makes the expression
4
3. (Confidence intervals: pooled versus paired procedure)
(a) Proof: Suppose ρ = Corr[Xi , Yi ] = 1 for i = 1, . . . , n. Then, there exist ai , bi , i = 1, . . . , n such that
for each i,
Prob[Xi = ai + bi Yi ] = 1
(by Theorem B, p. 143 [3rd ed.] or p. 133 [2nd ed.] in Rice).
Now, for each i = 1, . . . , n σ 2 = Var(Xi ) = b2i Var(Yi ) = b2i σ 2 , so |bi | = 1. Furthermore, since the
correlation is positive, eachbi must be positive. Consequently, all the bi ’s equal 1, and from that we
can deduce that for each i
All the Di ’s being the same, we conclude that sD̄ = 0, and therefore the confidence interval
s
D̄ ± tn−1,α/2 √D̄
n
since the sample covariance equals 0. That is, s2D̄ = s2X + s2Y . Consequently, the ratio of the length of
the “paired interval” over that of the “pooled interval” is
tn−1 (α/2)
.
t2n−2 (α/2)
Since 2n − 2 ≥ n − 1, for n ≥ 1, we have that tn−1 (α/2) ≥ t2n−2 (α/2). Therefore, the “paired interval”
is longer.
5
4. (Data analysis: Two samples I)
(a) Here is the R code:
d<-read.table("calcium-c11.txt")
n<-118 # number of observations
x<-log(d[,1])
y<-log(d[,2])
# Pooled procedure
xbar<-mean(x)
ybar<-mean(y)
ssqpool <- (n-1)/(2*n-2)*(var(x)+var(y))
tpool <- (xbar - ybar)/sqrt(ssqpool*(2/n)) # pooled t-statistic
2*pt(-abs(tpool),2*n-2) # p-value
The t statistic equals 0.1817, and the corresponding p-value is 0.856. If this procedure were justied,
then the evidence against the null would be extremely weak.
The t-statistic in this case equals 4.0362, and the p-value is 9.73 ?10−5 . If this procedure is justied,
then the evidence is very strong against the null.
(c) The two variables are strongly correlated, as can be seen from the scatterplot (Figure 2) and the sample
correlation coe?cient of 0.998. To compute a 95% condence interval for the true correlation coe?cient
ρ, we use the Fisher Z transformation. Take
1 1+r
Z = log ,
2 1−r
where r represents possible values of the sample correlation coefficient. Then Z is approximately
1
normally distributed, regardless of the true value of ρ, with standard deviation √n−3 . For r∗ = 0.998 we
obtain Z ∗ = 3.453. The margin of error in the Z scale is given by 1.96 √115
1
= 0.1828, so the endpoints
of a 95% CI for the mean of Z are 3.27 and 3.64. Applying the inverse Fisher Z transformation
2Z
r = ee2Z −1
+1
, we obtain the endpoints of a 95% CI for ρ:
(0.997, 0.998).
6
Figure 2: Scatterplot of “Flame method vs. “Oxalate method measurements (both log transformed).
(d) Clearly the paired procedure is more appropriate here given the strong correlation between the vari-
ables. The textbook suggests that there were 118 samples of feeds, and that each sample was measured
with both methods; so, the design of the study also suggests pairing the observations.
There is strong evidence against the null hypothesis of no effect; the effect is negative, as can be observed
by the sign of the t-value. The condence interval is not very tight, but conrms that breathing ozone makes
the rats lose weight.
7
Now we perform the same test assuming equal variances:
> t.test(ozone,control,var.equal=T)
Notice that the t-statistics are very close (this is due to the fact that the sample sizes are similar; if they
were equal, then the t-statistic would not change at all), but the number of degrees of freedom is different.
This indicates that when the sizes of the two samples are similar then the pooled procedure may perform
reasonably well even if the variances are fairly different. With roughly equal sample sizes the only real
difference between the pooled and the Satterthwaite procedure is in the degrees of freedom which are chosen
more conservatively in the Satterthwaite procedure. Of course, if the sample size is large, then the t-
distribution starts to look like a normal N (0, 1) and little changes in degrees of freedom have no large effect.
A quick look a the histograms for each group reveals no strong failures of the assumption of normality, so
the inferences reached seem reasonably solid. See Figure 3 for boxplots of the two groups, obtained with the
command boxplot(control,ozone,names=c("Control","Ozone")).
Figure 3: Boxplot of weight increase for the two groups: control and treated with ozone (corresponds to
Problem 5).