Vous êtes sur la page 1sur 7

Statistics 512 Notes 2

Confidence Intervals
Definition: For a sample
1
, ,
n
X X K
from a model
{ , } P


, a
(1 )
confidence interval for a parameter
( ) g
is an interval
1 1
[ ( ,..., ), ( ,..., )]
n n n
C a X X b X X
such
that
( ( ) ) 1
n
P g C


for all .
In words,
n
C
is a function of the random sample that traps
the parameter
( ) g
with probability at least
(1 )
.
Commonly, people use 95% confidence intervals which
corresponds to choosing 0.05 .
Example: Suppose
1 2 3 4
, , , X X X X
are iid
( ,1) N
. The
interval [ 1, 1] C X X + is a 0.9544 confidence interval
for

:
( [ 1, 1]) ( 1 1)
= ( 1 1)
= ( 2 2)
1/ 4
= ( 2 2)

P X X P X X
P X
X
P
P Z

+ +



=0.9544
Motivation for confidence intervals:
A confidence interval can be thought of as an estimate of
the parameter, i.e., we estimate

by [ 1, 1] X X + rather
than the point estimate
X
.
What is gained by the using interval rather than the point
estimate since the interval is less precise?
We gain confidence. We have the assurance that in 95.44%
of repeated samples, the confidence interval will contain

.
In practice, confidence intervals are usually used along
with point estimates to give a sense of the accuracy of the
point estimate.
Interpretation of confidence intervals
A confidence interval is not a probability statement about
( ) g
since is a fixed parameter, not a random variable.
Common textbook interpretation: If we repeat the
experiment over and over, a 95% confidence interval will
contain the parameter 95% of the time. This is correct but
not particularly useful since we rarely repeat the same
experiment over and over.
More useful interpretation (Wasserman, All of Statistics) :
On day 1, you collect data and construct a 95 percent
confidence interval for a parameter
1

. On day 2, you
collect new data and construct a 95 percent confidence
interval for an unrelated parameter
2

. On day 3, you
collect new data and construct a 95 percent confidence
interval for an unrelated parameter
3

. You continue this


way constructing 95 percent confidence intervals for a
sequence of unrelated parameters
1 2
, , K
Then 95
percent of your intervals will trap the true parameter value.
Confidence interval is not a probability statement about :
The fact that a confidence interval is not a probability
statement about is confusing. Let be a fixed, known
real number and let
1 2
, X X
be iid random variables such
that
( 1) ( 1) 1/ 2
i i
P X P X
. Now define
i i
Y X +
and suppose we only observe
1 2
, Y Y
. Define the
following confidence interval which actually contains
only one point:
1 1 2
1 2 1 2
{ 1} if
{( ) / 2} if
Y Y Y
C
Y Y Y Y

'
+

No matter what is, we have


( ) 3/ 4 P C


so this is a
75 percent confidence interval. Suppose we now do the
experiment and we get
1
15 Y
and
2
17 Y
. Then our 75
percent confidence interval is {16}. However, we are
certain that is 16.
Some common confidence intervals
1. CI for mean of normal distribution with known variance:
1
, ,
n
X X K
iid
2
( , ) N where
2
known.
Then
~ (0,1)
X
N
n

Let
1
( ) z


where is the CDF of a standard normal
random variable, e.g.,
.975
1.96 z
. We have
1 1
2 2
1 1
2 2
1 1
2 2
1


X
P z z
n
P z X z X
n n
P X z X z
n n






,
_


,
_
+

,
Thus,
1
2
X z
n

t
is a
(1 )
CI for

2. CI for mean of normal distribution with unknown


variance.
1
, ,
n
X X K
iid
2
( , ) N where
2
unknown.
Key fact: The random variable
X
T
S
n

, where
2 2
1
1
( )
1
n
i
i
S X X
n


, has a Students t-distribution
with n-1 degrees of freedom. (Section 3.6.3, page 186)
Let
,n
t

be the inverse of the CDF of the Students t-


distribution with n degrees of freedomevaluated at

.
Note
1
t t


Following the same steps as above, we have
1 , 1 ,
2 2
1 , 1 ,
2 2
1 , 1 ,
2 2
1


n n
n n
n n
X
P t t
S
n
S S
P t X t X
n n
S S
P X t X t
n n






,
_


,
_
+

,
Thus,
1 ,
2
n
S
X t
n

t
is a
(1 )
CI for

Note:
1 , 1
2 2
n
t z


>
so we pay a price for not knowing the
variance but as
1 , 1
2 2
,
n
n t z



.
3. CI for mean of iid sample from unknown distribution:
Central Limit Theorem (Theorem 4.4.1): For an iid sample
from a distribution that has mean

and positive variance


2

, the random variable


n
n
X
Y
n

converges in
distribution to a standard normal random variable.
Slutskys Theorem (Theorem 4.3.5):
, , ,
D P P
n n n
X X A a B b
then
D
n n n
A B X a bX + +
.
From the weak law of large numbers, if
4
( ) E X <
2 2 2
1
1
( )
1
P
n
i
i
S X X
n


.
Thus, combining Slutskys Theorem and the central limit
theorem,
(0,1)
D
n
X
N
S
n

An approximate
(1 )
CI for

is
1 / 2 n
S
X z
n

t
because
1 1
2 2
1 1
2 2
1 1
2 2
1


X
P z z
S
n
S S
P z X z X
n n
S S
P X z X z
n n






,
_


,
_
+

,
Application: A food-processing company is considering
marketing a new spice mix for Creole and Cajun cooking.
They interview 200 consumers and find that 37 would
purchase such a product. Find an approximate 95%
confidence interval for p, the true proportion of buyers.

Vous aimerez peut-être aussi