Académique Documents
Professionnel Documents
Culture Documents
Interval Estimation
Testing
October 1, 2007
Point Estimation
Interval Estimation
Testing
Outline
1 Point Estimation
Sampling Distributions for Point Estimators
Small Sample Properties
Large Sample Properties
2 Interval Estimation
Sampling Distributions for Interval Estimators
Small Sample Properties
Large Sample Properties
3 Testing
Some Statistical Decision Theory
Sampling Distributions for Test Statistics
p-Values, Rejection Regions, and CIs
Point Estimation
Often we use a statistic to estimate (or guess) the value of a parameter, and we will
denote this with a hat (e.g. θ̂). Such estimation is known as point estimation.
Point Estimates are realized values of an estimator, and hence they are not random
(e.g. x̄).
Histogram of income
0.08
0.06
Density
0.04
0.02
0.00
0 5 10 15 20
income
Histogram of income
0.08
Population Density
0.06
Density
0.04
0.02
0.00
0 5 10 15 20
income
We may not have enough data to get a good estimate of the density (infinite data
histogram), but we may have enough data to estimate one characteristic (parameter) of
the density. Often we choose the balance point as our parameter of interest.
Histogram of income
0.08
Density Balance Point
0.06
Density
0.04
0.02
0.00
0 5 10 15 20
income
Clearly, some of these estimators are better than others (which ones?), but how can we
define “better”?
Illustrative Example:
X = the number of times a respondent voted in the last two presidential elections.
We will assume three possible values {0,1,2}
8
< 1/4 x = 0
Assume P(x) = 1/2 x = 1
1/4 x = 2
:
Assume n=2
Exercise:
1 List all the possible samples
2 Calculate the probability of each sample under repeated sampling
3 Form the sampling distribution for the sample mean
ANES Example
The next slide shows an approximation of this procedure for the four proposed
estimators. I simulated 10,000 data sets of size n from the density shown at the
beginning of the lecture notes.
0.10
0.06
0.06
Density
Density
0.02
0.02
−0.02
−0.02
−10 0 10 20 30 40 0 10 20 30
muHat1 muHat2
1.0
0.3
0.6
Density
Mass
0.1
0.2
−0.1
−0.2
5 10 15 20 12 14 16 18 20 22
muHat3 muHat4
Bias
Bias is the expected difference between the estimator and the parameter. Bias is not
the difference between an estimate and the parameter.
h i
Bias(θ̂) = E θ̂ − θ
h i
= E θ̂ − θ
h i
Bias(X n ) = E X n − E[X ]
= E [µ̂ − µ]
= 0
Example
1 E[Y1 ] = µ
2 E[ 12 (Y1 + Yn )] = 1
2
(µ + µ) = µ
3 E[7] = 7
1
4 E[Y n ] = n
nµ =µ
Estimators 1,2, and 4 all get the right answer on average. Which is better?
0.06
Density
Density
0.02
0.02
−0.02
−0.02
−10 0 10 20 30 40 0 10 20 30
muHat1 muHat2
1.0
●
0.3
0.6
Density
Mass
0.1
0.2
−0.1
−0.2
5 10 15 20 12 14 16 18 20 22
muHat3 muHat4
Election Example
Let π be the proportion of voters who will vote for the Republican candidate in the 2008
general election. Let’s examine two estimators.
1 vote rep
1 µ̂ = Y1 =
0 otherwise
2 µ̂ = class guess
Which is unbiased?
Variance
All else equal, we prefer estimators with small variance. In particular, if two estimators
are unbiased, we prefer the estimator with the smaller variance.
Low variance means that under repeated sampling, the estimates are likely to be
similar.
Note that this doesn’t necessarily mean that a particular estimate is close to the true
parameter value.
Note also that the standard deviation from a sampling distribution is often called the
standard error.
Variance
1 V [Y1 ] = σ 2
2 V [ 12 (Y1 + Yn )] = 1
4
V [Y1 + Yn ] = 1
4
(σ 2 + σ2 ) = 1 2
2
σ
3 V [7] = 0
1 1 2
4 V [Y n ] = n2
nσ 2 = n
σ
Among the unbiased estimators, the sample average has the smallest variance. This
means that Estimator 4 (the sample average) is likely to be closer to the true value µ,
than Estimators 1 and 2.
In order to fully understand this, it is helpful to again look at the sampling distributions.
0.06
Density
Density
0.02
0.02
−0.02
−0.02
−10 0 10 20 30 40 0 10 20 30
muHat1 muHat2
1.0
●
0.3
0.6
Density
Mass
0.1
0.2
−0.1
−0.2
5 10 15 20 12 14 16 18 20 22
muHat3 muHat4
n
X n
X
2
(xi − a) = {(xi − x̄) + (x̄ − a)}2
i=1 i=1
n n
X o
2 2
= (xi − x̄) + 2(x̄ − a)(xi − x̄) + (x̄ − a)
i=1
n
X n
X n
X
2
= (xi − x̄) + 2(x̄ − a) (xi − x̄) + (x̄ − a)2
i=1 i=1 i=1
n
X
= (xi − x̄)2 + n(x̄ − a)2
i=1
Show that X is the best linear unbiased estimator for µ (i.e. smallest variance unbiased
estimator).
Pn Pn
1 Use E[ i=1wi Xi ] = µ to derive something about i=1 wi .
Pn
2 Simplify V [ i=1 wi Xi ].
1
3 Write each wi in this simplified expression as n
+ ci .
4 ...
MSE is the expected squared difference between the estimator and the parameter.
MSE is not the squared difference between an estimate and the parameter.
Furthermore, MSE can be written as the Bias squared plus the Variance.
Example
Assume an i.i.d. sample and recall the two possible definitions of sample variance:
n
1X
S02n = (Xi − X n )2
n
i=1
n
1 X
S12n = (Xi − X n )2
n−1
i=1
Asymptotic Unbiasedness
E[θbn ] → θ
n=1 n = 10 n = 100
0.40
0.4
0.4
0.35
0.3
0.3
0.30
0.25
θ^)
θ^)
θ^)
f(θ
f(θ
f(θ
0.2
0.2
0.20
0.15
0.1
0.1
0.10
0.05
0.0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
θ^ θ^ θ^
Consistency
θbn →p θ
n=1 n = 10 n = 100
0.40
4
1.2
0.35
1.0
3
0.30
0.8
0.25
f(Xn)
f(Xn)
f(Xn)
2
0.6
0.20
0.4
0.15
1
0.2
0.10
0.0
0.05
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
Xn Xn Xn
An estimator θbn with possibly unknown sampling distribution, has asymptotic sampling
distribution F if
1 θbn has a sampling distribution described by cdf Fn , and
2 Fn →d F as n → ∞
n=1 n=2
0.08
0.08
Density
Density
0.04
0.04
0.00
0.00
0 5 10 15 20 25 0 5 10 15 20 25
muHat4 muHat4
n=10 n=30
0.20
Density
0.10
0.00
10 15 20 12 14 16 18 20
muHat4 muHat4
[θ̂lower , θ̂upper ]
Example: Party ID
QUESTION:
---------
Generally speaking, do you usually think of yourself as a
REPUBLICAN, a DEMOCRAT, an INDEPENDENT, or what?
Would you call yourself a STRONG [Democrat/Republican] or
a NOT VERY STRONG [Democrat/Republican]?
Do you think of yourself as CLOSER to the Republican
Party or to the Democratic party?
VALID CODES:
------------
0. Strong Democrat (2/1/.)
1. Weak Democrat (2/5-8-9/.)
2. Independent-Democrat (3-4-5/./5)
3. Independent-Independent
(3/./3-8-9 ; 5/./3-8-9 if not apolitical)
4. Independent-Republican (3-4-5/./1)
5. Weak Republican (1/5-8-9/.)
6. Strong Republican (1/1/.)
Let X be a discrete random variable describing PID with the following distribution.
x 0 1 2 3 4 5 6
f (x) .16 .15 .17 .10 .12 .14 .16
Interval Estimates
10
8
sample
6
4
2
0 1 2 3 4 5 6
============================================================================
B1. INTRO THERMOMETERS PRE
============================================================================
Histogram of hcFTS
80
Frequency
40
0
0 20 40 60 80 100
hcFTS
Histogram of jeFTS
Frequency
40 80
0
0 20 40 60 80 100
jeFTS
2 4 6 8
sample
0 20 40 60 80 100
^
µ
0 20 40 60 80 100
^
µ
Coverage Probability
Coverage probability is the probability that an interval estimator contains the true value
of the parameter.
P(θ̂lower ≤ θ ≤ θ̂upper ) = 1 − α
Question:
What is the probability that an interval estimate contains the true value of the
parameter. For example,
s s
[x̄ − 1.96 · √ , x̄ + 1.96 · √ ]
n n
b−µ
µ
σ ∼ N(0, 1)
√
n
!
b−µ
µ
P −1.96 ≤ σ ≤ 1.96 = 95%
√
n
„ «
σ σ
P b − 1.96 √ ≤ µ ≤ µ
µ b + 1.96 √ = 95%
n n
σ
µ̂ ± 1.96 √
n
!
b−µ
µ
P −1.96 ≤ σ ≤ 1.96 = 95%
√
n
!
b−µ
µ
P −zα/2 ≤ σ ≤ zα/2 = (1 − α)%
√
n
„ «
σ σ
P b − zα/2 √ ≤ µ ≤ µ
µ b + zα/2 √ = (1 − α)%
n n
We usually construct the (1 − α)% confidence interval with the following formula.
σ
µ̂ ± zα/2 √
n
Question:
Why not 100% confidence?
Suppose we model JE FTS scores as normal distributed with σ unknown. Recall that if
X1 , ..., Xn ∼i.i.d. N(µ, σ 2 ) , then
b−µ
µ
σ ∼ N(0, 1)
√
n
Question:
Why can’t our previous interval be used?
σ
µ̂ ± zα/2 √
n
S
SE[µ̂]
c = √
n
Z
X ≡ q
Y
ν
follows a tν distribution.
If a sample (X1 , . . . , Xn ) of any size n is taken from a normal distribution with
known mean and unknown variance then the sampling distribution of the sample
mean minus the known mean divided by the sample standard error will have the
t distribution with ν = n − 1.
(1 − α)% t- Intervals
b−µ
µ
σ ∼ tn−1
√
n
0 1
b−µ
µ
P @−tn−1,α/2 ≤ σ̂
≤ tn−1,α/2 A = (1 − α)%
√
n
„ «
σ̂ σ̂
P b − tn−1,α/2 √ ≤ µ ≤ µ
µ b + tn−1,α/2 √ = (1 − α)%
n n
We usually construct the (1 − α)% confidence interval with the following formula.
σ̂
µ̂ ± tn−1,α/2 √
n
Without making an assumption about the population distribution, we will often not know
the sampling distribution of the interval estimator, and therefore, we will not know the
coverage probability.
P(θ̂lower ,n ≤ θ ≤ θ̂upper ,n ) → 1 − α
as
n→∞
bn − µ
µ
→d N(0, σ 2 )
√1
n
and
σ̂n →p σ
it can be shown that
bn − µ
µ
σ̂n
→d N(0, 1)
√
n
Therefore, our normal quantile confidence intervals will have valid asymptotic
coverage. (t-quantile intervals also)
t1
t4
0.5
t 15
0.4
Density
0.3
0.2
0.1
0.0
−4 −2 0 2 4
3.0
2.5
Clinton
2.0
1.5
1.0
0.5
0.0 Edwards
40 45 50 55 60
^
µ
Suppose we can somehow model the probabilities for the various outcomes conditional
on the true state of the world.
We would like α and β to be small, but it may be difficult to achieve both goals.
The standard statistical approach is to pick a small level for α (e.g. 5%), and then try to
minimize β given this constraint.
As in our previous example, let µ be the expected value of JE FTS for the population.
Lets assume the population mean for HC FTS is 55 (i.e. equal to the sample mean)
Here are two possible hypothesis tests:
H0 : µ = 55
H1 : µ 6= 55
H0 : µ ≤ 55
H1 : µ > 55
Test Statistics
A test statistic is a function of the sample and the null hypothesis (and may provide
evidence against the null hypothesis).
Examples:
1 If H0 : µ = 55, then X − 55 would be a test statistic.
2 If H0 : µ ≤ 55, then X − 55 would be a test statistic.
Why does the second test statistic make sense given the inequality in the null
hypothesis?
Let µ0 be the “null” value of the parameter µ (e.g. 55). Then the one sample t-statistic
can be written as the following:
X − µ0
S
√
n
Notice that being a function of the sample, this t-statistic will have a sampling
distribution.
A null distribution is the sampling distribution for the test statistic when the null
hypothesis is true. More exactly, the null distribution is the sampling distribution for the
test statistic when θ = θ0 .
For our example, the null distribution is the sampling distribution of the t-statistic
X − 55
S
√
n
when µ = 55.
Suppose we model JE FTS scores as normally distributed with σ unknown. Recall that
if X1 , ..., Xn ∼i.i.d. N(µ, σ 2 ) , then
X − 55
S
∼ tn−1
√
n
when µ = 55.
Null Distribution
0.4
0.3
f(test statistic)
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
test statistic
p-Value
The p-value is the probability under the null distribution of getting a sample at least as
extreme as the one we got.
Examples:
˛
H1 : µ 6= 55 ⇒ p-value = P(tstat ≥ |tobs | ∪ tstat ≤ −|tobs |˛µ = 55)
˛
H1 : µ > 55 ⇒ p-value = P(tstat ≥ tobs ˛µ = 55)
t−obs
0.2
−t−obs
0.0
−3 −2 −1 0 1 2 3
test statistic
t−obs
0.2
0.0
−3 −2 −1 0 1 2 3
test statistic
Rejection Regions
Recall that α is the probability of Type I Error. Often we want to limit α to 5% while
minimizing the probability of Type II Error. This can be accomplished in the following
manner.
α=5%)
Two Sided Rejection Region (α
0.4
f(test statistic)
fences
0.2
0.0 t−obs
−3 −2 −1 0 1 2 3
test statistic
α=5%)
One Sided Rejection Region (α
0.4
f(test statistic)
fence
0.2
t−obs
0.0
−3 −2 −1 0 1 2 3
test statistic
α=5%)
Two Sided Rejection Region (α
0.4
f(test statistic)
fences
0.2
t−obs
−t−obs
0.0
−3 −2 −1 0 1 2 3
test statistic
α=5%)
One Sided Rejection Region (α
0.4
f(test statistic)
fence
0.2
t−obs
0.0
−3 −2 −1 0 1 2 3
test statistic
α=5%)
Rejection Regions and CIs (α
fences
CI
0.3
f(X|H 0)
0.2
0.1
0.0
50 52 54 56 58 60