Vous êtes sur la page 1sur 21

# Hypothesis Testing:

## Continuous Variables (2 Sample)

I. Introduction
II. Brief Review & Discussion of Logic
III. Independent Groups
1. Formula
2. Formal Example - [Minitab] [Spreadsheet]
1. Research Question
2. Hypotheses
3. Assumptions
4. Decision Rules
5. Computation
6. Decision
IV. Dependent Groups
1. Discussion
2. Formula
3. Formal Example - [Minitab]
1. Research Question
2. Hypotheses
3. Assumptions
4. Decision Rules
5. Computation
6. Decision

Homework

I. Introduction
The use of designs that involve two samples far exceeds that of those previously discussed for
two reasons:

1. It is rare that µ or σ are known. When using two samples, neither of these parameters are
required.
2. Since two groups (or measurements) are included, one will serve as a concurrent control.
In other words, the two groups (or measurements) occur closely together in time and
space. Thus, the treatment and testing circumstances (which introduce lots of potential
extraneous variables) can be better controlled. For example, in terms of our IQ/"Bad
Kids" example, perhaps the IQ of the population was taken 2 years previously and the IQ
in that area was increasing at the rate of 2-4 points per year (for whatever reason; you can
be creative here).

Recall our first example of the experimental method at the very beginning of the semester
involving the effects of marijuana on memory. The ability to analyze such an experiment has
been one of the major goals of this course. In this experiment there were two groups and we need
to be able to compare the means and see if the difference is worth paying attention to (i.e., did
marijuana have an effect on the memory performance)?

## II. Brief Review & Discussion of Logic

Let's take a step back and review what we have covered thus far about inferential statistics.
Actually it goes back a little further than that to where we learned about standard scores and the
normal distribution. The key point was that area under the curve implies probability. To
determine these probabilities, we computed the standard or z scores. That is:

for samples

## the the general case

We also saw that the sampling distribution of the mean was a normal distribution with:

and respectively.

So we were able to use Z scores to determine the probability that a particular sample mean was
drawn from a given population. That is:
Case I
(1 sample,
µ&σ
known)

Then we went on to a more realistic situation in which the population standard deviation was not
known. We estimated it from the sample standard deviation. This complicated things a bit in that
the shape of the sampling distribution, while still normal, differed in its kurtosis as a function of
the sample size (or more accurately, the df). This family of distributions was called Student's t
and the formula became:

Case II
(1 sample,
σ
unknown)
df=N-1

However, as was noted earlier, rarely do we know any of the population parameters, and it is
desirable to have a concurrent control. So we need another sampling distribution to help us
compute the relevant probabilities.

The sampling distribution involves two means, so it is called the sampling distribution of the
difference between means. Note that if the two means are the same (when there is no effect of the
IV), the difference between them will be zero. So the value that we are interested here (in terms
of the general formula for Z given above) is the difference between the means, that is:

The mean and standard deviation of the sampling distribution of the difference between means
are given by:

and respectively.

The latter is called the standard error of the difference between means. Since the sample standard
deviation is again used to estimate the population value, the sampling distribution of the
difference between means will also be distributed as t (the family of normal distributions that
differ in kurtosis as a function of the df). So the formula becomes:

## However, the null hypothesis says that:

HO: µ1=µ2

That is, the two means come from the same population; there is no difference between them (i.e.,
µ1-µ2=0). Thus, the formula reduces to:

All we need to do now is determine the formula for the standard error. However, this formula
differs depending whether we are dealing with independent or dependent groups. Once we
understand this distinction, we can move on to Case III (i.e., independent groups) and Case IV
(i.e., dependent groups) of hypothesis testing with continuous variables.

With the independent groups design, the subjects in each of the two groups are different and
unrelated in any way. The most common type of dependent groups design is also called a
within subjects or repeated measures design, because the same subjects (thus actually only one
group) are tested twice.

## III. Independent Groups (Case III)

1. Formula

The defining formula (when the sample sizes are equal) for the standard error of the difference
between means is:
And thus the formula for the t value is:

The computational formulas (which will also handle unequal sample sizes) are given by

And:

Since two variances are used in estimating the standard error of the difference between means,
the degrees of freedom will equal the sum of the degrees of freedom for each of the variance
estimates, that is:
2. Formal Example - [Minitab] [Spreadsheet]

Suppose you are a researcher interested in the factors influencing paper grading by professors.
You have a hunch (and/or previous research) might lead you to predict that papers that are typed
are rated higher than papers that are handwritten. Research to date though, has only been
correlational and thus little can be said in terms of a cause and effect relationship.

So you have 10 freshman students currently taking English as well as an introductory psychology
course each write one paper. They should each provide two copies of their paper (one typed and
one handwritten). Next, we enlist the aid of 20 English instructors. We randomly assign 10
instructors to each of two groups. Each instructor in one group (the control group) will grade
each of the 10 papers that are hand written, while the second group (the experimental group) will
grade the same papers that are typed.

1. Research Question

2. Hypotheses

In Symbols In Words
HO µ 1=µ 2
(as compared to a handwritten paper).
HA µ 1≠µ 2 Typing influences the grade for better or worse.

3. Assumptions
1. Our subjects were chosen randomly from the population.
2. The groups are independent.
3. There is homogeneity of variance That is, the amount of variability in the
DV is about equal in each of the groups. When the samples sizes are
reasonably large and the number of subjects in each group is about equal,
This means that it is strong and can tolerate some violations of its
assumptions.
4. Sampling distribution of the difference between means is normal in shape.
In other words, the DV should be normally distributed in the population.
5. The null hypothesis.

4. Decision Rules
Using alpha of .05 with a two-tailed test and df=N1+N2-2=10+9-2=17, we
determine from the t table that the critical value is 2.110. Thus:

## If tobs ≤ -2.110 or tobs ≥ 2.110, then reject HO.

If tobs > -2.110 and tobs < 2.110, then do not reject HO.
5. Computation
Since we are not interested in the differences between the scores of the 10 papers
graded by an instructor, we simply calculate the mean grade given by each
instructor. Note that one of the instructors in the Written Group had to be
excluded because their dog ate the papers they were supposed to grade. Thus, we
then have 19 means . To describe the data, we present the means and
standard deviations for each of the two groups, that is:

## Written (1) Typed (2)

81 84
81 89
79 89
80 81
84 87
Data 87 82
75 87
83 85
88 89
83

82.0 85.6

s 4.03 3.03

N 9 10

Now the inferential question is whether this difference between means is worth
paying attention to. Thus, we will use a between groups t test to answer this
question.
Substituting the appropriate values gives:

6. Decision
Since -2.222 (tobs) < -2.110 (tcrit) we reject HO and assert the alternative. In other
words, we conclude that typing a paper improves the grade it receives. Notice that
we have actually gone beyond the alternative hypothesis by specifying that the
effect has a direction (typing is good).

## IV. Dependent Groups (Case IV)

1. Discussion
As noted earlier, the most common type of Dependent Groups Design is also called a
Within Subjects or Repeated Measures Design, because the same subjects (thus, actually
only one group) are tested twice. There is another situation, though, in which this analysis
is sometimes used. It is called the Matched Groups Design. In this case, there are two
groups, but they are matched on some variable that is highly and positively correlated
with the DV. The procedures involved in matching will be presented more clearly below
in the formal example.

2. Formula
In this case, the standard error of the difference between means is given by:

Notice that the formula requires the computation of the correlation between the two sets of
scores. It is here that we see the potential advantage to this design. That is, the error term (the
standard error of the difference between means) is decreased in direct proportion to the
magnitude of this correlation, which results in a potentially more powerful or sensitive test. The
disadvantage though is the loss of degrees of freedom. The N here refers to the number of pairs
of scores (for an individual or matched pair of individuals). Thus, the degrees of freedom is half
what we would have if we had used a between groups approach (i.e., N-1 is 1/2 of N1+N2-2). The
trick is to make sure the correlation is large enough to offset the loss of df.

The formula above would be very cumbersome to use. Fortunately, there is another technique
available for obtaining the t value called the Direct Difference Method. If the difference between
the X and Y scores is designated as D (i.e., D=X-Y), then we may then we may restate the null
and alternative hypotheses as:

In Symbols
HO µ D=0
HA µ D≠0

And with:

becomes:

## Below is the derivation of the computational formula:

where the df=N-1 and N refers to the number of pairs of scores.

## 3. Formal Example - [Minitab]

Suppose you are interested in reactions times to different colored lights (especially green and
red). We could use either:

• Repeated measures design - test each subject for a number of trials, such as
GGRRGRRG, etc. Then compute the average speed to each color light for each
subject.
• Matched groups design - test all subjects' reaction times to white light for a
given number of trials. Using this data, create two matched groups, that is, take
the two quickest subjects and randomly assign one to each of the groups. Then
take the next two quickest subjects and randomly assign one of them to each of
the groups, etc. Ex:

Ranked
Red Green
Data
2 1
1, 2, 3, 4,
3 4
5, 6, 7, 8, 6 5
8 7
...
...

Note that the number of subjects must be devisable by the number of groups.

1. Research Question
Does reaction to red and green lights differ?

2. Hypotheses

In Symbols In Words
There is no difference in reaction times between
HO µ 1=µ 2
red and green lights.
There is a difference in reaction times between
HA µ 1≠µ 2
red and green lights.

3. Assumptions
1. Our subjects were chosen randomly from the population.
2. The scores of the two conditions are correlated (i.e., the groups are
dependent).
3. The sampling distribution of the difference between means is normal in
shape. In other words, the DV should be normally distributed in the
population.
4. The null hypothesis.

4. Decision Rules
We will test 10 (or 20 if matched) subjects. Using alpha of .05 with a two-tailed
test and df=N-1=9, we determine from the t table that the critical value is 2.262.

Thus:

## If tobs ≤ -2.262 or tobs ≥ 2.262, then reject HO.

If tobs > -2.262 and tobs < 2.262, then do not reject HO.

5. Computation
First we describe the data by computing the means for each condition/group.
While we are at it, we might as well compute the difference scores and their
squares (since we will need them for the analysis).

Subject
X (red) Y (green) D D2
(or pair)
1 18 22 -4 16
2 16 20 -4 16
3 23 29 -6 36
4 30 35 -5 25
5 32 27 5 25
6 30 29 1 1
7 31 33 -2 4
8 25 29 -4 16
9 27 31 -4 16
10 21 24 -3 9
Then, for the inferential test, we will use a within groups t test (the direct
difference method) and thus we have the formula:

## And substituting the appropriate values gives:

6. Decision
Since -2.512 (tobs) < -2.262 (tcrit) we reject HO and assert the alternative. In other
words, we conclude that reaction time is quicker to red as compared to green light
2.TWO-SAMPLE TEST OF A HYPOTHESIS
3. A. Overview of Two-Sample Hypothesis Testing
B. Step-By-Step Instructions for Performing a Two-Sample Hypothesis Test in
Excel
C. Interpreting the Results of the Test
4. A. Overview of Two-Sample Hypothesis Testing
5. Two-sample hypothesis testing is statistical analysis designed to test if
there is a difference between two means from two different
populations. For example, a two-sample hypothesis could be used to
test if there is a difference in the mean salary between male and
female doctors in the New York City area. A two-sample hypothesis
test could also be used to test if the mean number of defective parts
produced using assembly line A is greater than the mean number of
defective parts produced using assembly line B. Similar to one-sample
hypothesis tests, a one-tailed or two-tailed test of the null hypothesis
can be performed in two-sample hypothesis testing as well. The two-
sample hypothesis test of no difference between the mean salaries of
male and female doctors in the New York City area is an example of a
two-tailed test. The test of whether or not the mean number of
defective parts produced on assembly line A is greater than the mean
number of defective parts produced on assembly line B is an example
of a one-tailed test. The following section provides step-by-step
instructions for performing a two-sample test of a hypothesis in Excel.
6. B. Step-By-Step Instructions for Performing a Two-
Sample Hypothesis Test in Excel
7. Big Foods Grocery has two grocery stores located in Johnston City. One
store is located on First Street and the other on Main Street and each is
run by a different manager. Each manager claims that her store's
layout maximizes the amounts customers will purchase on impulse.
Both managers surveyed a sample of their customers and asked them
how much more they spent than they had planned to, in other words,
how much did they spend on impulse? The following table shows the
sample data collected from the two stores.
First Street Main Street

15.78 15.19

17.73 18.22

10.61 15.38

15.79 15.96

14.22 21.92

13.82 12.87

13.45 12.47

12.86 13.96

10.82 13.79

12.85 13.74

18.4

18.57

17.79

10.83

## 8. Upper-level management at Big Foods Grocery wants to know if there

is a difference in the mean amounts purchased on impulse at the two
stores and has hired you to perform the statistical analysis. This
question can be addressed by performing a two-sample test of a
hypothesis. The following describes the steps to perform the test in
Excel.
9. Step 1. The first step is to state the hypothesis to be tested, called
the null hypothesis, and the alternative hypothesis. In this example,
upper-level management wants to know if there is a difference in the
mean amounts purchased on impulse at the two stores. An alternative
way to state this question is "Is the mean amount purchased on
impulse at the First Street store equal to the mean amount purchased
at the Main street store?" Recall that the "equality" part of the
hypothesis is always stated in the null hypothesis. Therefore, the null
and alternative hypotheses for this example are:

10.
11. where μf is the mean amount spent on impulse in the First Street store
and μm is the mean amount spent on impulse in the Main Street store.
Note, this is a two-tailed test of a hypothesis.
12. Step 2. Select the level of significance to be used in the test. The
level of significance is the probability of rejecting the null hypothesis
when it is true. Common significance levels are .10, .05, and .01.
Suppose you chose a .05 level of significance, meaning there is a 5%
chance that you will reject the null hypothesis when it is true.
13. Step 3. Select the test statistic that is appropriate for this test. In
general, you will need to decide between using a z test statistic or a t
test statistic. If one or more of the sample sizes is less than 30 (as in
this problem), a t statistic is appropriate. The test statistic for this
example is:
14.

15.

16.
17.
18.
19.
20.
21. Determine the rejection region. The rejection region defines the
conditions under which the null hypothesis is rejected. (See the
section One-Sample Test of a Hypothesis for more details on the
rejection region.) The critical values for this test are based on degrees
of freedom, and in this problem the degrees of freedom are equal to 22
(10 + 14 - 2). The critical t values are -2.074 and 2.074. Therefore, if
the test statistic is less than -2.074 or greater than 2.074, we will reject
the null hypothesis in favor of the alternative. Perform the hypothesis test.
The above calculations are easily computed in Excel. First, input the data into an Excel

22.
23. From the Tools pull-down menu, select Data Analysis, and then
select t-Test: Two-Sample Assuming Equal Variances.
24.
25. Click OK in the Data Analysis window and the t-Test: Two-Sample
Assuming Equal Variances window opens.

26.
27. In the Variable 1 Range field, type A2:A11, or click the worksheet
icon to the right of the Variable 1 Range field and click and drag the
cursor over the data in column A. In the Variable 2 Range field, type
B2:B15, or click the worksheet icon to the right of the Variable 2
Range field and click and drag the cursor over the data in column B. In
the Hypothesized Mean Difference field type 0 and in the Output
Options box, type D1 in the Output Range field. The t-Test: Two-
Sample Assuming Equal Variances window should appear as
follows:
28.
29. Click OK in the t-Test: Two-Sample Assuming Equal Variances
window and the results of the hypothesis test appear:
30.

31.
32.C. Interpreting the Results of the Test
33.The results of the two-sample test are shown above. Excel calculates
the test statistic and critical values for the test. Recall that if the test
statistic is less than -2.074 or greater than 2.074, we reject the null
hypothesis in favor of the alternative. The test statistic is -1.649,
which does not fall into the rejection region, so we fail to reject the null
hypothesis of no difference between the means from the two samples.
In other words, we fail to reject that the mean amount spent on
impulse at the First Street grocery store is equal to the mean amount
spent on impulse at the Main Street grocery store with 95%
confidence.
Elon University