Introduction to Inference
6.1 Estimating with Confidence
6.2 Tests of Significance
6.3 Use and Abuse of Tests
6.4 Power and Inference as a Decision382
LCHAPTER 6 Introduction to Inference
: Introduction
j ‘The purpose of statistical inference is to draw conclusions from data. We have
examined data and arrived at conclusions many times previously. Formal in-
ference adds an emphasis on substantiating our conclusions by probability
calculations.
Probability allows us to take chance variation into account and so to cor-
rect our judgment by calculation. Here is an example.
: Beazer ‘The Wade Tract in Thomas County, Georgia, is an old-growth forest
of longleaf pine wees (Pinus palustris) that has survived in a relatively
undisturbed state since before the settlement of the area by Europeans. Foresters who
study these trees arc interested in how the trees are distributed in the forest. Are the
locations of the trees random, with no particular paiterns? Oris there some sort of
clustering, resulting i regions that have more trees than others? Figure 6.1 gives a
plot of the locations of sll $84 longleaf pine trecs in a 200-meter by 200-meter region
in the Wade Tract? Do the locations appear to be random, or do there appear to be
clusters of tees? One approach to the analysis of these data indicates that a pattern
a clustered of more clustered than the one in Figure 6.1 would occur only 4% of the
4 time if in Fact, the locations of longleaf pine trees in the Wade Tract are random.
SEE EEE eee ee ee ee eee
Becaiise this chance is fairly small, we conclude that there is some clus-
tering of these trees. Our probability calculation helps us to distinguish
“Nort-soinh”
8
nae 7 East-West
FIGURE 6.1 The distibution of longleaf pine trees, for Example 61.6.1._ Estimating with Confidence | 383
between patterns that are consistent or inconsistent with the random location
scenario.
Our unaided judgment can also err in the opposite direction, seeing a sys-
tematic effect when only chance variation is at work. Give a new drug and a
' placebo to 20 patients each; 12 of those taking the drug show improvement,
but only 8 of the placebo patients improve. Is the drug more effective than the
placebo? Perhaps, but a difference this large or larger between the results in
the two groups would occur about one time in five simply because of chance
variation. An effect that could so easily be just chance is not convincing.
In this chapter we introduce the two most prominent types of formal
statistical inference. Section 6.1 concerns confidence intervals for estimating
the value of a population parameter. Section 6.2 presents tests of significance,
a which assess the evidence fora claim. Both types of inference are based on the
sampling distributions of statistics. That is, both report probabilities that state
what would happen if we used the inference method many times. This kind of
probability statement is characteristic of standard statistical inference. Users
. : of statistics must understand the nature of the reasoning employed and the
5 meaning of the probability statements that appear, for example, on computer
output for statistical procedures.
‘Because the methods of formal inference are based on sampling distribu-
tions, they require a probability model for the data. Trustworthy probability
models can arise in many ways, but the model is most secure and inference is
most reliable when the data are produced by a properly randomized design.
When you use statistical inference, you are acting as if the data come from a ran-
ef dom sample or a randomized experiment. If this is not true, your conclusions
may be open to challenge. Do not be overly itnpressed by the complex details,
of formal inference. This elaborate machinery cannot remedy basic flaws in
producing the data such as voluntary response samples and confounded ex-
periments. Use the common sense developed in your study of the first three
chapters of this book, and proceed to detailed formal inference only when you
are satisfied that the data deserve such analysis.
The primary purpose of this chapter is to describe the reasoning used in
statistical inference. We will discuss only a few specific inference techniques,
and these require an unrealistic assumption: that we know the standard dev
ation a. Later chapters will present inference methods for use in most of the
settings we met in learning to explore data. There are libraries—both of books
and of computer software—full of more elaborate statistical techniques. In-
formed use of any of these methods requires an understanding of the under-
lying reasoning. A computer will do the arithmetic, but you must still exercise
judgment based on understanding.
6.1 Estimating with Confidence
The SAT tesis are widely used measures of readiness for college study: There
are two parts, one for verbal reasoning ability (SATV) and one for mathemati-
cal reasoning ability (SATM). In April 1995, the scores were recentered so that
the mean is approximately 500 in alarge “standardization group.” This scale is,
maintained from year to year so that scores have a constant interpretation. In
2003, 1,406,324 college-bound seniors took the SAT. Their mean SATV score384 | CHAPTER 6 Introduction to Inference
was $07 with a standard deviation of 111. For the SATM the mean was 519
with a standard deviation of 115,
‘You want to estimate the mean SATM score for the more than 385,000
high school seniors in California. You know better than to trust data
from the students who choose to take the SAT. Only about 45% of California students
take the SAT. These self-selected students are planning to attend college and are not
representative of all California seniors. At considerable effort and expense, you give
the test to a simple random sample (SRS) of $00 California high school seniors. The
mean score for your sample is ¥ = 461. What can you say about the mean score sin
the population of all 385,000 seniors?
tH
‘The sample mean ¥ is the natural estimator of the unknown population
mean jz, We know that ¥ is an unbiased estimator of . More important, the
Jaw of large numbers says that'the sample mean must approach the popula-
tion mean as the size of the sample grows. The value ¥ = 461 therefore ap-
pears to be a reasonable estimate of the mean score 1 that all 385,000 students
, ‘would achieve if they took the test. But how reliable is this estimate? A second
sample would surely not give 461 again, Unbiasedness says only that there is
no systematic tendency to underestimate or overestimate the truth. Could we
plausibly get a sample mean of 410 or 510 on repeated samples? An estimate
without an indication of its variability is of little value.
Statistical confidence
Just as unbiasedness of an estimator concerns the center of its sampling dis-
tribution, questions about variation are answered by looking at the spread. We
know that ifthe entire population of SAT scores has mean 1 and standard de-
viation o,, then in repeated samples of size 500 the sample mean follows the
(t,o /-V500) distribution. Let us suppose that we know that the standard de-
viation ¢ of SATM scores in our California population is ¢ = 100. (This is not
realistic. We will see in the next chapter how to proceed when ¢ is not known.
For now, we are more interested in statistical reasoning than in details of re-
alistic methods.) In repeated sampling the sample mean X has a normal dis-
tribution centered at the unknown population mean 2 and having standard
deviation
Now we are in business, Consider this line of thought, which is illustrated
by Figure 6.2
+ The 68-95-99.7 rule says that the probability is about 0.95 that ¥ will be
within 9 points (two standard deviations of 2) of the population mean
score #.
+ Tosay that lies within 9 points of 1 is the same as saying that 1 is within
9 points of F
+ So 95% of all samples will capture the true ys in the interval from ¥ — 9 to
E+9.61 _ Estimating with Confidence} 385
Density curve oF.
FIGURE 6.2. # les within 4 9 of «in 95% of all samples, so w also lies
within de 9 of 8 in those samples.
We have simply restated a fact about the sampling distribution of €. The
language of statistical inference uses this fact about what would happen in the
long run to express our confidence in the results of any one sample. Our sample
gave = 46). We say that we are 95% confident that the unknown mean score
for al} California seniors lies between.
9= 461-9 = 452
and
F+9=4614+9=470
Be sure you understand the grounds for our confidence. There are only
two possibilities:
4. The interval between 452 and 470 contains the true 1.
2. Our SRS was one of the few samples for which is not within 9 points of
the true . Only 5% of all samples give such inaccurate results.
We cannot know whether our sample is one of the 95% for which the interval
¥49 catches 1 or one of the unlucky 5%. The statement that we are 9:
fident that the unknown yz lies between 452 and 470 is shorthand for saying,
“We arrived at these numbers by a method that gives correct results 95% of
the time.”
Confidence intervals
The interval of numbers between the values ¥ 49 is called a 95% confidence
interval for j, Like most confidence intervals we will meet, this one has the
form
estimate + margin of error386
CHAPTER 6
margin of error
Introduction to Inference
The estimate (& in this cese) is our guess for the value of the unknown pa~
rameter. The margin of error £9 shows how accurate we believe our guess
is, based on the variability of the estimate. The confidence level shows how
confident we are that the procedure will catch the true population mean u.
Figure 6.3 illustrates the behavior of 95% confidence intervals in repeated
sampling. The center of each interval is at ¥ and therefore varies from sample
to sample. The sampling distribution of ¥ appears at the top of the figure to
show the long-term pattern of this variation. The 95% confidence intervals
% £9 from 25 SRSs appear below. The center Z of each interval is marked by a
dot. The arrows on either side of the dot span the'confidence interval. All ex-
cept one of the 25 intervals cover the true value of . In a very Jarge number
of samples, 95% of the confidence intervals would contain j.. With the Con-
fidence Interval applet you can construct many diagrams similar to the one
displayed in Figure 6.3
Statisticians have constructed confidence intervals for many different
parameters based on a variety of designs for data collection. We will meet
Density curve of ¥
_/”
#
FIGURE 6.3 Twenty-five samples from the same population gave
these 95% confidence intervals. In the fang run, 95% of all samples give
‘an interval that covers 1.61. Estimating with Confidence | _387
a number of these in Tater chapters. You need to know two important things
about a confidence interval:
1, It is an interval of the form (a,b), where a and b are numbers computed
from the dara
2. Ithas a property called a confidence level that gives the probability that the
interval covers the parameter.
Users can choose the confidence level, but 959% is the standard for most
situations. Occasionally, 90% or 99% is used. We will use C to stand for the
confidence Jevel in decimal form. For example, a 95% confidence level corre-
sponds to C = 0.95.
CONFIDENCE INTERVAL
Alevel C confidence interval for a parameter is an interval computed
from sample data by a method that has probability C of producing an
interval containing the true value of the parameter.
Confidence interval for a population mean
We will now construct a level C confidence interval for the mean 1 of a pop-
ulation when the data are an SRS of size n. The construction is based on
the sampling disiributibn of the sample mean F. This distribution is exactly
Nw. ¢ fit) when the population has the Ntw.a) distribution. The central |
Iimit theorem says that this same sampling distribution is approximately cor-
rect for large samples whenever the population mean and standard deviation. )
re and o1
Our construction of a 95% confidence interval for the mean SAT score be-
gan by noting that any normal distribution has probability about 0.95 within
+2 standard deviations of its mean. To construct a Jevel C confidence inter
val we first catch the central C area under a normal curve. That is, we must
find the number z* such that any normal distribution has probability C within
2° standard deviations of its mean. Because all normal distributions have
the same standardized form, we can obtain everything we need from the stan-
dard normal curve. Figure 6.4 shows how C and 2" are related. Values of z* for
many choices of C appear in the row labeled 2* at the bottom of Table D at the
back of the book, Here are the most important entries from that row:
1.645 1.960 2.576
90% 95% 99%
Les]
Any normal carve has probability C between the point z* standard devia-
tions below the mean and the point z* above the mean, as Figure 6.4 reminds
us. The sample mean X has the normal distribution with mean jz and standard
deviation «/ fi. So there is probability C that ¥ lies between
o °
u-2% and wtz388 | CHAPTER 6 Introduction to Inference
FIGURE 6.4 Thearea between ~2" and z* under the standard normal
cuneis C.
‘This is exactly the same as saying that the unknown population mean j: lies
between metry
uN
and 340
va va
‘That is, there is probability C that the interval ¥-+2*a /V/ contains j. That is
our confidence interval. The estimate of the unknown 1 is %, and the margin
of error is 2a //n.
CONFIDENCE INTERVAL FOR A POPULATION MEAN
Choose an SRS of size n from a population, having unknown mean j1
and known standard deviation a. The margin of error for a level C
confidence interval for y is |
the critical points —2* and 2". The level C confidence interval for pis,
Here 2" is the value on the standard normal curve with area C between |
zim |
‘This interval is exaci when the population distribution is normal and_|
is approximately correct for large n in other cases. |
PSVTUT TER IEG 2h: Notional Student Loan Survey collects data to examine questions
related to the amount of money that borrowers owe. The survey selected
‘a sample of 1280 borrowers who began repayment of their loans between four and six
months prior tothe study? The mean of the debt for undergraduate study was $18,90061. Estimating with Confidence | 589
2 tT
and the standard deviation was about $49,000. This distribution is clearly skewed but
because our sample size is quite large, we can rely on the central limit theorem to as-
sure us that the confidenee interval based on the normal distribution will be a good
approximation. Let’s compute a 95% confidence interval for the true mean debt for all
borrowers. Although the standard deviation is estimated from the data collected, we
will treat it asa known quantity for our calculations here.
For 95% confidence, we see from Table D that 2° = 1,960. The margin of error for
the 95% confidence interval for jis therefore
96
= 2684
‘We have computed the mangin of error with more digits than we really need. Our
mean is rounded to the nearest $100, so we will do the same for the margin of error,
Therefore, we will use m = 2700. The 95¢ confidence interval is
bm = 18,900: 2700
= (16,200, 21,600)
We are 95% confident that the mean debt is between $16,200 and $21,600,
Se ES SSeS Settee eo eee
Note that we have rounded the results to the nearest $100. Keeping addi-
tional digits would provide no additional useful information
In this example we used a value for ¢ based on a large sample. Because of
this, the confidence interval that we calculated will be the same as the one that
‘we would compute using the methods discussed in the next chaptér, where we
no longer need to assume that ¢ is known,
‘Suppose the researchers who designed the National Student Loan Survey
had used a different sample size. How would this affect the confidence inter-
val? We can answer this question by changing the sample size in our calcula-
tions and assuming that the mean and standard deviation are the same.
Let's assume that the sample mean of the debt for undergraduate study
is $18,900 and the standard deviation is about $49,000, as in Example 6.3.
‘But suppose that the sample size is only 320. The margin of error for 95% confidence is
zw
Gre)
and the 95% confidence interval is
Fem = 18,900 + 5400
(43,500, 24,300)
ee Soe eee eee eee390 | CHAPTER 6 Introduction to Inference
14000 16,000 18,000 20,000 22,000 24,000 26,000
FIGURE65 Confidenceintervalsforn = 1280 andn = 320, for Examples 65
and 64
Notice that the margin of error for this example is twice as large as the
margin of error that we computed in Example 6.3. The only change that we
made was to assume that the sarnple size is 320 rather than 1280. The new
sample size is exactly one-fourth of the original 1280. We double the margin
of error when we reduce the sample size to one-fourth of the original value.
Figure 6.5 illustrates the effect in terms of the intervals.
‘The argument leading to the form of confidence intervals for the popula-
tion mean 2 rested on the fact that the statistic used to estimate 4. has a nor-
‘mal distribution. Because many sample estimates have normal distributions
(at least approximately), it is useful to notice that the confidence interval has
the form
estimate + 2°ersimate
‘The estimate based on the sample is the center of the confidence interval. The
margin of error is z'Geuimme- The desired confidence level determines 2° from
Table D. The standard deviation of the estimate is found from a knowledge of
the sampling distribution in a particular case. When the estimate is ¥ from an
SRS, the standard ceviation of the estimate is o/ Jn.
How confidence intervals behave
‘The margin of error z"z/Vi for the mean of a normal population illustrates
several important properties that are shared by all confidence intervals in
common use. The user chooses the confidence level, and the margin of er-
ror follows from this choice. High confidence is desirable and so is a small
margin of error. High confidence says that our method almost always gives
correct answers. A small margin of error says that we have pinned down the
parameter quite precisely.
‘Suppose that you calculate a margin of errorand decide that itis too large
Here are your choices to reduce it
+ Use a lower level of confidence (smaller C).
+ Increase the sample size (larger n).
+ Reduce o.
For most problems you would choose a confidence level of 90%, 95%, or
99%. So 2* can be 1.645, 1.960, or 2.576. A look at Figure 6.4 will convince
you that 2* will be smaller for lower confidence (smaller C). Table D shows
that this is indeed the case, If and o are unchanged, a smaller z* leads to
fa smaller margin of ervox. Similarly, increasing the sample size » reduces the61 Estimating with Confidence | 391
T
99%
Bree EEE ee Eee confidence
95%
confidence
SLL aR RR
14000 16,000 18,000 20,000 22,000 24,000 26,000
FIGURE 6.6 Confidence intervals for Examples 63 and 65,
margin of error for any fixed confidence level. The square root in the formula
implies that we must multiply the number of observations by 4 in order to
cut the margin of error in half The standard deviation @ measures the vari-
ation in the population. You can think of the variation among individuals in
the population as noise that obscures the average value . It is harder to pin
down the mean y of a highly variable population; that is why the margin of
error of a confidence interval increases with o. In practice, we can sometimes
reduce o by carefully controlling the measurement process or by restricting,
our attention to only part of a large population.
Pi Suppose that for the student Joan data in Example 6.3, we wanted 99%
aia confidence rather than 95%, Table D tells us that for 99% confidence,
2 = 2.576, The margin of error for 99% confidence based on 1280 observations is
and the 99% conficence interval is
Fim = 18,9004 3500
= 5,400, 22,400)
‘Requiring 99%, rather than 9504, confidence has increased the margin of error from
: 2700 to 3500. Figure 6.6 compares the two interval.
Se ee eee ee eee eee eee
Choosing the sample size
A wise user of statistics never plans data collection without at the same time
planning the inference. You can arrange to have both high confidence and
a small margin of error. The margin of error of the confidence interval X
a /