Hypothesis Testing Procedure

MATH& 146
Lesson 14
Section 2.3
The Hypothesis Test Procedure
1
Make-Up Final
A (semi) well-known story* goes something like this:
Four students missed the final exam for their statistics

class. They went to the professor and said, "Please,
oh please, let us make up the final. We carpool
together, and on our way to the class, we got a flat tire,
and that's why we missed the exam."
* Discussion of this story can be found at

http://www.snopes.com/college/exam/flattire.asp
2
Make-Up Final
The professor didn't believe them, but instead of
arguing he said, "Sure, you make up the exam. Be
in my office tomorrow at 8."
The next day, they met in the office. He sent each

student to a separate room and gave them an
exam. The exam consisted of only one question:
"Which tire?"
3
Make-Up Final
We don't know the outcome of this story, but let's
imagine that all four students answered, "left rear
tire."
The professor would be surprised. He had
assumed that the students were lying. "Maybe,"
he thinks, "they just got lucky. After all if they just
guessed, they could still all choose the same tire."
4
Make-Up Final
But then he does a quick calculation and figures
out that the probability that all four students will
guess the same tire is only 1.6%.
Reluctantly, he concedes that the students were
probably not lying, and now he must give all of
them an A on the exam.
second student third student fourth student

all four choose
P P choose same P choose same P choose same
the same tire as first student as first student as first student

1 1 1 1
0.015625 5
4 4 4 64
Hypothesis Testing
The statistics professor has just performed an
hypothesis test. Hypothesis testing is a formal
procedure that enables us to choose between two
hypotheses when we are uncertain about our
measurements.
Hypothesis testing is called a formal procedure
because it is based on particular terminology and a
rather well-specified set of steps.
6
Hypothesis Testing
To perform an hypothesis test:
1) Set up two contradictory hypotheses.
2) Collect sample data.
3) Determine the correct test and check the
conditions, noting any assumptions that were
made.
4) Analyze sample data by performing the calculations
that ultimately will support one of the hypotheses.
5) Make a decision and write a meaningful conclusion.
7
The Hypotheses
In testing the truthfulness of his students, the
statistics professor had two competing claims:
H0: The students were telling the truth.
HA: The students were not telling the truth.
We call H0 the null hypothesis and HA the
alternative hypothesis.
8
The Hypotheses
The null hypothesis, H0, is the conservative, status-
quo, business-as-usual statement about a population
parameter. In the context of researching new ideas,
the null hypothesis often represents "no change," "no
effect," or "no difference."
The alternative hypothesis, HA, is the research

hypothesis. It is usually a statement about the value of
a parameter that we hope to demonstrate is true.
9
Example 1
State the null and alternative hypotheses.
a) You are testing a new design for airbags used
in automobiles, and you are concerned that
they might not open properly.
b) A US court considers two possible claims about
a defendant: she is either innocent or guilty.
10
Symbols Used in H0 and HA
For us, the two most common parameters are
means and proportions.
Since we are making claims about populations (as
opposed to samples), we will use the symbols
(for means) and p (for proportions).
Point Estimate Parameter

Mean x (x-bar) (mu)
Proportion p (p-hat) p
Use these for

hypotheses. 11
Symbols Used in H0 and HA
H0 HA
equal (=) not equal to ()
greater than or equal to () less than (<)
less than or equal to () more than (>)
To keep it simple, the null hypothesis is almost

always written with the equal sign (=). This
practice is acceptable because our goal is to prove
the alternative hypothesis, not the null.
12
Writing Hypotheses
In general, hypotheses can be written as follows:
H0 : parameter null value

H A : parameter null value

13
Example 2
The manufacturer of a new car model advertises that
the car averages 38 miles per gallon on the highway.
A consumer group says that this claim is overstated
and that the actual average (mean) is less than 38
miles per gallon. State the null and alternative
hypotheses for a hypothesis test.
Note: We call 38 the null value since it represents the
value of the parameter if the null hypothesis is true.
14
P-Values
Hypothesis testing is built around rejecting or
failing to reject the null hypothesis. That is, we do
not reject H0 unless we have strong evidence.
But what precisely does strong evidence mean?
15
P-Values
It would be helpful to quantify the strength of the
evidence against the null hypothesis.
In the beginning anecdote, the professor was
trying to determine the truthfulness of his students.
He determined that by chance alone, the
probability that all four students would choose the
same tire was less than 2%, which suggests that
the students were not just guessing.
16
P-Values
When results like these are inconsistent with H0,
we reject H0 in favor of HA. Here, the professor
concluded that the students were not lying.
This 2-in-100 chance is what is called a p-value,
which is the probability quantifying the strength of
the evidence against the null hypothesis and in
favor of the alternative.
17
P-Values
The p-value is the probability of observing data at
least as favorable to the alternative hypothesis as
our current data set, if the null hypothesis is true.
We typically use a summary statistic of the data,
such as the difference in proportions, to help
compute the p-value and evaluate the hypotheses.
This summary value that is used to compute the p-
value is often called the point estimate.
18
Significance Levels
As a general rule of thumb (more on this later), for
those cases where the null hypothesis is actually
true, we do not want to incorrectly reject H0 more
than 5% of the time.
This corresponds to a significance level of 0.05.
We often write the significance level using (the
Greek letter alpha): = 0.05.
19
P-Values and Decisions
When the p-value is smaller than the significance
level, we say the results are statistically
significant.
This means the data provide such strong evidence
against H0 that we reject the null hypothesis in
favor of the alternative hypothesis.
Note: While the significance level is often set to
= 0.05, it can vary depending on the field or the
application.
20
P-Values and Decisions
You can think of as the boundary between the
plausible and implausible null hypothesis.
P-values larger than indicate that the null
hypothesis is plausible and so we fail to reject.
P-values smaller than indicate that the null
hypothesis is implausible and can be considered
evidence to reject.
implausible plausible
21
Fail to Reject
A null hypothesis is not accepted just because it is
not rejected. Data not sufficient to show
convincingly the alternative hypothesis do not
prove that the null hypothesis is true.
It might be the null hypothesis is true, or it may be
that the sample was too small. For that reason,
we never accept the null hypothesis, only fail to
reject.
22
Example 3
In the opportunity cost study (Lesson 13), we analyzed
an experiment where study participants were 20% less
likely to continue with a DVD purchase if they were
reminded that the money, if not spent on the DVD,
could be used for other purchases in the future. We
determined that such a large difference would only
occur about 1-in-150 times if the reminder actually had
no influence on student decision-making.
What is the p-value in this study? Was the result
statistically significant?
23
So Many p's
Use caution with the symbols:
p is the population proportion.
p0 is the value of the population proportion
according to the null hypothesis (null value).
p is the sample proportion.
The p-value is the probability that if the null
hypothesis is true, our test statistic will be as
extreme as or more extreme than the value we
actually observed.
24
Why 0.05?
We often use a threshold of 0.05 to determine
whether a result is statistically significant. But why
0.05? Maybe we should use a bigger number, or
maybe a smaller number.
If you're a little puzzled, that probably means
you're reading with a critical eye good job!
25
Why 0.05?
The truth is that 0.05 is an arbitrary, though
universally accepted, cutoff point that dates back
to the early twentieth century and statistics pioneer
R.A. Fisher (1890 1962).
The number 0.05 corresponds to 1-in-20 and
borders the boundary between the likely and
unlikely.
Also, as we will see later, 0.05 works nicely with
the normal distribution.
26
Decision Errors
One caution: hypothesis tests are not flawless.
Just think of the court system: innocent people are
sometimes wrongly convicted and the guilty
sometimes walk free. Similarly, we can make a
wrong decision in statistical hypothesis tests.
However, the difference is that we have the tools
necessary to quantify how often we make such
errors.
27
Decision Errors
There are two competing hypotheses: the null and
the alternative. In a hypothesis test, we make a
statement about which one might be true, but we
might choose incorrectly. There are four possible
scenarios in a hypothesis test, which are
summarized below.
Test conclusion
do not reject H0 reject H0 in favor of HA
Correct Outcome Type 1 Error
H0 true
(True Negative) (False Positive)
Truth
Type 2 Error Correct Outcome
HA true
(False Negative) (True Positive)
28
Type 1 Errors
A Type 1 error occurs when the null hypothesis is
true, but random sampling caused your data to
appear to support the alternative. These types of
errors, such as when a person is wrongly
diagnosed as having a disease, are also known as
false positives.
Test conclusion
H0 true
Truth
HA true
29
Type 2 Errors
A Type 2 error occurs when the null hypothesis
really is false, but random sampling caused your
data to not reject the null. These types of errors,
such as when an infected person is given a clean
bill of health, are also known as false negatives.
Test conclusion
H0 true
Truth
HA true
30
Example 4
a) In a US court, the defendant is either innocent (H0)
or guilty (HA). What does a Type 1 Error represent
in this context? What does a Type 2 Error
represent?
b) How could we reduce the Type 1 Error rate in US
courts? What influence would this have on the
Type 2 Error rate?
c) How could we reduce the Type 2 Error rate in US
courts? What influence would this have on the
Type 1 Error rate?
31
Error "Tug-of-War"
The previous example provides an important
lesson: if we reduce how often we make one type
of error, we generally make more of the other type.
The only way to reduce both is to take larger

samples. Even then, the errors will rarely be 0.
32
Choosing the Significance Level
Usually when choosing a significance level, we will

use the industry standard of = 0.05. However, it
is often helpful to adjust the significance level
based on the application.
For example, = 0.10 and = 0.01 are also

commonly used levels of significance.
33
If making a Type 1 Error is dangerous or especially

costly, we should choose a small significance level
(e.g. 0.01). In this case, we would demand very
strong evidence favoring HA before rejecting H0.
If a Type 2 Error is relatively more dangerous or
much more costly than a Type 1 Error, then we
should choose a higher significance level (e.g.
0.10). In this case, we would want to be cautious
about failing to reject H0 when the null is actually
false.
34
The significance level selected for a test should

reflect the real-world consequences associated
with making a Type 1 or Type 2 Error.
If Type 1 is much worse than Type 2, reduce the
significance level to reduce the false positives.
If Type 2 is much worse than Type 1, increase
the significance level to reduce the false
negatives.
35
Example 5
A part inside of a machine is very expensive to
replace. However, the machine usually functions
properly even if this part is broken, so the part is
replaced only if we are extremely certain it is
broken based on a series of measurements.
Identify appropriate hypotheses for this test (in

plain language) and suggest an appropriate
significance level.
36
Example 5 Solution
H0: The part is not broken.
HA: The part is broken.
A Type 1 Error would be made if the part really is not
broken but we think it is and replace it (at great
expense).
A Type 2 Error would be made if the part really is
broken but we do not think so and leave it alone (with
no great cost).
Financially, a Type 1 Error is much worse, so reducing
the significance level to 0.01 would be appropriate.
37
Example 6
Describe the two errors for the following hypotheses.
Which error is more significant?
H0: WAMAP is safe

Ha: WAMAP is not safe
38

Hypothesis Testing Procedure

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Hypothesis Testing Procedure

Transféré par

Droits d'auteur :

Formats disponibles

MATH& 146

Four students missed the final exam for their statistics

* Discussion of this story can be found at

The next day, they met in the office. He sent each

second student third student fourth student

The alternative hypothesis, HA, is the research

Point Estimate Parameter

Use these for

To keep it simple, the null hypothesis is almost

H0 : parameter null value

But what precisely does strong evidence mean?

The only way to reduce both is to take larger

Usually when choosing a significance level, we will

For example, = 0.10 and = 0.01 are also

If making a Type 1 Error is dangerous or especially

The significance level selected for a test should

Identify appropriate hypotheses for this test (in

H0: WAMAP is safe

Vous aimerez peut-être aussi