Vous êtes sur la page 1sur 9

Chapter 13 (independent study unit)

X2 Tests for Count Data

Chi-square Tests for One-Way Tables


- https://www.youtube.com/watch?v=gkgyg-eR0TQ&feature=youtu.be
Chi-square Tests: Goodness of Fit for the Binomial Distribution
- https://www.youtube.com/watch?v=O7wy6iBFdE8&feature=youtu.be
Chi-square Tests of Independence (Chi-square Tests for Two-Way Tables)
- https://www.youtube.com/watch?v=L1QPBGoDmT0&feature=youtu.be
Chi-square tests for count data: Finding the p-value
- https://www.youtube.com/watch?v=HwD7ekD5l0g&feature=youtu.be

13.1 Introduction

X2 tests involve only hypothesis tests, not confidence intervals, and there is (essentially) one formula for the test statistic in all of the
different situations. The calculations are straightforward but can be long and tedious, so we often use software to do the
calculations for us.

We will investigate two main variants of the X 2 test:


1. Tests for one-way tables. These tests are often called goodness-of-fit tests
2. Tests for two-way tables. These tests are often called X2 tests of independence

X2 test has an approximate X2 distribution.

13.2 X2 Tests for One-Way Tables

Observations are classified according to a single categorical variable.

Example 13.1. Does the distribution of ABO blood type among sufferers of pancreatic cancer differ from the distribution of blood
type for the general population? A study investigated a possible relationship between blood type and pancreatic cancer by looking at
the data from a large sample of nurses in the US. The table below illustrates the breakdown of ABO blood type for a sample of 200
nurses that developed pancreatic cancer.

For the population of all nurses in the US, 36% have blood type A, 13% have blood type B, 8% have blood type AB, and 43% have
blood type O. We will assume that these values represent the true distribution of blood types for American nurses.
Blood Type A B AB O Total
Count 71 36 23 70 200
Percentage 35.5 18.0 11.5 35.0 100

We can test the null hypothesis that the distribution of ABO blood type for nurses with pancreatic cancer is the same as for the
population of nurses:

Ho: pA = 0.36, pB = 0.13, pAB = 0.08, po = 0.43

The alternative hypothesis is that the hypothesized proportions are not all correct (In other words, that the distribution of blood
type among pancreatic cancer sufferers differs from the distribution of blood type for the population). We can test the null
hypothesis with a X2 test.

13.2.1 The X2 Test Statistic

In X2 tests for one way tables, we test the null hypothesis that the true proportions for c categories of a categorical variable are
equal to the certain hypothesized values:

Ho = p1 = p10, p2 = p20, pc = pc0

The appropriate test statistic is:

(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝜒2 = ∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
𝑎𝑙𝑙 𝑐𝑒𝑙𝑙𝑠

Observed represents the observed sample counts in the various categories. Expected represents the count we would expect to get,
on average, if the null hypothesis were true. For each category: Expected count = Hypothesized proportion x sample size

If the null hypothesis is true, the test statistic will have an approximate chi-squared distribution. For this type of test, the degrees of
freedom are the number of categories minus one. This approximation works best for large sample sizes. Assumptions in 13.4.2.

If the observed counts are close to the expected counts, then the test statistic will be very small. If very different, test statistic is very
large. The larger the value of the test statistic, the greater the evidence against the null hypothesis.

The p-value is to the right of the observed test statistic.

From example 13.1:

Blood Type A B AB O Total


Observed Count 71 36 23 70 200
Expected Count 0.36 x 200 = 0.13 x 200 = 0.08 x 200 = 0.43 x 200 = 200
72.0 26.0 16.0 86.0
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝜒2 = ∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
𝑎𝑙𝑙 𝑐𝑒𝑙𝑙𝑠
2
(71 − 72)2 (36 − 26)2 (23 − 16)2 (70 − 86)2
𝜒 = + + +
72 26 16 86
2
𝜒 = 9.90

Since there are 4 squared terms, there are 4 – 1 = 3 degrees of freedom. The p-value is illustrated below.

Using software, we can find that the p-value is 0.0194 (Using a X2 tables, we can find only that the p-value falls in an interval, such as
0.01 < p-value < 0.025).

The p-value is small, indicating fairly strong evidence against the null hypothesis (small enough to be significant at the commonly
chosen significance level of 0.05).

This means that there is fairly strong evidence that the distribution of blood types for American nurses with pancreatic cancer differs
from the distribution of blood types for the entire population of American nurses. In short, there is some evidence of an association
between blood type and pancreatic cancer.

Example 13.2 In a famous genetics experiment in 1905, William Bateson and Reginald Punnett investigated inheritance in sweet
peas. Purple flowers and long grains are dominant traits, so the peas resulting from the first-generation cross all had purple flowers
and long grains. Crossing those resulting in 9:3:3:1 phenotypic ratio.

Phenotype
Purple/long Purple/round Red/long Red/round
Observed count 284 21 21 55
Expected count 9/16 x 381 = 71.4 71.4 23.8
214.3

There appears to be large differences between observed and expected counts. Let’s carry out hypothesis test, testing the
hypotheses:

Ho: True ratio of phenotypes is 9:3:3:1


Ha: True ration of phenotypes is not 9:3:3:1

The test statistic is calculated to be 134.7. The p-value is the area to the right of 134.7 under a X2 distribution with 3 degrees of
freedom. There is some non-zero area to the right of 134.7, but for all intents and purposes, this area is 0. No chance of observing
what was observed if the true ratio is 9:3:3:1

This experiment gave extremely strong evidence that the inheritance was not following Mendelian inheritance laws under
independent assortment.

13.2.2 Testing Goodness-of-Fit for Specific Parametric Distributions

In some probability and statistical inference scenarios discussed in previous chapters, we have assumed sample came from specific
probability distribution, such as the binomial, Poisson, or normal distributions. We can carry out a X 2 goodness-of-fit test to
investigate this assumption.
Example 13.3 Two seeds are planted in each of 100 pots. The number of seeds that germinate in each pot is recorded, with the
following results.
Number that germinate 0 1 2
Number of pots 16 14 70

Ho: The number of germinating seeds in a pot follows a binomial distribution


Ha: The distribution differs from the binomial in some way

Output from statistical software S-Plus:

Chi square Goodness of Fit Test


data: seeds
Chi-square = 36.5714, df = 1, p-value = 0
alternative hypothesis: True cdf does not equal the binomial distribution for at least one sample
point

With small p-value, we have very strong evidence number of germinating seeds do not follow a binomial distribution.

To find the expected counts under the null hypothesis, we first need the best estimate of p, the probability of success on a single
trial in a binomial setting. The estimator p is the sample proportion of germinating seeds:

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑒𝑑𝑠 𝑔𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑 (0 × 16) + (1 × 14) + (2 × 70)


𝑝̂ = = = 0.77
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑒𝑑𝑠 200

Used to obtain the estimated probabilities and expected counts for each cell:

Number of germinating seeds Expected proportion Expected number


0 2 (0.0529)(100) = 5.29
( ) (0.77)0 (1 − 0.77)2 = 0.0529
0
1 2 35.42
( ) (0.77)1 (1 − 0.77)1 = 0.3542
1
2 2 59.29
( ) (0.77)2 (1 − 0.77)0 = 0.5929
2

The test statistic is equal to 36.57. Since we used the data to estimate the parameter p, we lost an extra degree of freedom (one
degree of freedom is lost for every parameter estimated from the data). The appropriate degrees of freedom are 3 – 1 – 1 = 1.

The p-value is 1.5x10-9. Since the p-value is tiny, we have extremely strong evidence against the null hypothesis. We can reject the
null hypothesis at any reasonable level (and many unreasonable ones!). There is extremely strong evidence that the number of
seeds that germinate in a pot does not follow a binomial distribution.

Some conditions satisfied here – We have a fixed number of trials and we are counting the number of successes. Independence
assumption must be violated.

13.3 X2 Tests for Two-Way Tables

In two-way contingency tables the observations are classified according to two different categorical variables. These types of
problems are more common and interesting than one-way problems.

Here we are interested in investigating a possible relationship between variables.

Example 13.4 Is there a relationship between fatty fish consumption and the rate of prostate cancer? Study followed 6272 Swedish
men for 30 years. They were categorized according to their fish consumption, and to whether they developed prostate cancer.

Fish Consumption
Never/seldom Small Moderate Large
Prostate cancer 14 201 209 42
No prostate cancer 110 2420 2769 507
% prostate cancer 11.3 7.7 7.0 7.7
Example 13.5 An experiment investigated the effect of two types of frontier medicine on the health of cardiac patients. All 748
cardiac patients in the study received at least the standard treatment, but some patients received additional treatments. Each
patient was randomly assigned to one of 4 groups:

1. Received only standard treatment


2. Prayers said for them by prayer groups
3. Received music, imagery, and touch (MIT) therapy
4. Received both interventions

Several variables were measured on each individual six months after the start of the intervention. One variable was six-month
survival (whether the patient was still alive after six months)

Intervention
Prayer MIT Prayer and MIT Neither
Dead 11 4 3 9
Alive 171 181 186 183
% Dead 6.0 2.2 1.6 4.7

Is there a significant difference in the six-month survival rates of the four groups?

In two-way tables, we often wish to test the null hypothesis that there is no relationship between the row and column variables. The
hypotheses will be phrased differently, depending on the sampling design.

If we have a single sample, and classify the individuals according to two variables (like example 13.4), our hypotheses will be:

Ho: The row and column variables are independent


Ha: The row and column variables are dependent

If we have several samples from different populations, or an experiment with several different treatment groups (like example 13.5),
our hypotheses will be:

Ho: The distribution of the response variable is the same for all of the populations
Ha: The distribution of the response differs in some way between populations

We are testing the null hypothesis that the row and column variables are NOT related (there is no association between the row and
column variables).

13.3.1 The X2 Test Statistic for Two-Way Tables

The same test statistic will be used for all two-way tables involving count data:

(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝜒2 = ∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
𝑎𝑙𝑙 𝑐𝑒𝑙𝑙𝑠

The observed counts are found in sample data. The expected counts are given by:

𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙


𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 =
𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑡𝑜𝑡𝑎𝑙

The degrees of freedom are:

𝐷𝐹 = (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1) × (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1)


13.3.1.1 Expected Counts in Two-Way Tables

Example 13.6 Supposed we randomly sample 100 students at a university, and categorize them according to gender and whether or
not they have a car.

Car No car Total


Men 10 30 40
Women 12 48 60
Total 22 78 100

Based on this sample, the estimated proportion of male students at this university is 𝑝̂𝑚𝑎𝑙𝑒 = 0.40 and estimated proportion of car
owners at this university is 𝑝̂𝑐𝑎𝑟 = 0.22.

If two events are independent, the proportion of students that are male AND own a car is: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) = 𝑃(𝑐𝑎𝑟) ×
𝑃(𝑚𝑎𝑙𝑒) = 0.088. Expected count of males who own cars is (100)(0.088)= 8.8. This gives rise to the general form of expected
counts above.

Return to Example 13.4

Let’s test the hypotheses:

Ho: Fish consumption and prostate cancer are independent


Ha: Fish consumption and prostate cancer are not independent

Expected counts:

Fish Consumption
Never/seldom Small Moderate Large
Prostate cancer 9.21 194.74 221.26 40.79
No prostate cancer 114.79 2426.26 2756.74 508.21

The calculated test statistic is 3.6773. The degrees of freedom are: (2—1)(4—1)=3. The p-value is the area to the right of 3.677 which
is approximately 0.30. Since this p-value is quite large, there is no evidence against the null hypothesis of independence. We have no
evidence that fish consumption is related to the rate of prostate cancer in Swedish men.

Return to Example 13.5

Let’s test the hypotheses:

Ho: The six-month survival rate is the same for all treatment groups
Ha: The six-month survival rates are not all equal

Alternatively:

Ho: p1 = p2 = p3 = p4
Ha: These probabilities are not all equal
Where pi is the true six-month survival rate for the ith treatment type.

Output from R:

Pearson’s Chi-squared test


data: frontier_medicine
X-squared = 7.0766, df = 3, p-value = 0.0695

The evidence is not very strong and is not significant at the commonly chosen significance level of 0.05.
Example 13.7 Is there a relationship between education level and smoking status among French men? A study measured several
variables on 459 healthy men in France who were attending a clinic for a checkup.

Non-smoker Ex-smoker Moderate Heavy


Primary school 56 54 41 36
Secondary school 37 43 27 32
University 53 28 36 16

When they both have more than two levels, visualizing the relationship can be difficult, and is helpful to plot the graphs above.
These plots do not show any dramatic differences in the distributions but it appears as though those with a university education may
be more likely to be non-smokers and less likely to be heavy smokers.

Is smoking status independent of education level?

Ho: Smoking status and education level are independent


Ha: Smoking status and education level are not independent

Output from R:

Pearson’s Chi-squared test


data: French_smoking
X-squared = 13.305, df = 6, p-value = 0.03844

There is moderately strong evidence of an association between education level and smoking status in French men (moderately
strong evidence that these variables are not independent). Significant evidence against the null hypothesis means there is strong
evidence of an association between the row and column variables but the nature of the relationship can be difficult to determine
from the table of observations.

By looking at figure above, we can see that the most obvious differences are that those with a university education are more likely to
be non-smokers. We could investigate this in greater detail with other statistical inference techniques.

13.4 A Few More Points

13.4.1 Relationship between Z Test and X2 Test for 2x2 Tables

There is a direct relationship between the X2 test applied to a 2x2 table and the Z test for proportions discussed in Section 11.5. Too
illustrate, let’s return to Example 11.4, in which an experiment investigated whether a vitamin C supplement helps to reduce the
incidence of colds.

Cold No cold Total


Placebo 335 76 411
Vitamin C 302 105 407

To test the null hypothesis that the population proportions are equal (vitamin C has no effect), we can use either the Z test or the X2
test. When we first encountered this problem, we calculated the Z statistic:

𝑝̂1 − 𝑝̂2
𝑍= = 2.517
𝑆𝐸0 (𝑝̂1 − 𝑝̂2 )
The resulting p-value is 0.0118. But we can also carry out an X2 test on the same data:

Pearson’s Chi-squared test


data: vitamin
X-squared = 6.3366, df = 1, p-value = 0.01183

Note that the Z2 = X2 (2.5172 = 6.336), and the p-values are equal. This is an exact relationship, For 2x2 tables, the Z test for
proportions and the X2 test are equivalent tests.

13.4.2 Assumptions

The test is appropriate only if each individual appears in one and only cell. The following example violates this assumption:

Example 13.8 At a resort, 200 guest were asked which amenities they used during their stay. Individuals were able to use more than
one of the amenities, and many individuals appear in more than one cell (the total of all cells is 374, so on average each one of the
200 people used 1.87 amenities). The X 2 test would not be appropriate.

Amenity Men Women


Pool 52 68
Spa 26 44
Concierge 41 32
Room Service 45 20
None 12 34

The assumptions of X2 tests for count data are minimal:


1. The sample or samples are simple random samples from the populations of interest
2. The sample size is large enough for the X2 approximation to be reasonable

The test doesn’t perform well if the expected counts are very small. Statisticians have said they should not be used if expected count
is less than 5. Rough guideline: The test is reasonable if the average expected count is at least 5, there are not any expected
counts that are very small (less than one, say) and there are not too many less than 5.

The figure below shows histograms of 1,000,000 simulated values of the X2 test statistic. The values are simulated under the
assumption that the null hypothesis is true. Plots on the left are of the test H o: p1 = 0.01, p2 = 0.02, p3 = 0.03, p4 = 0.94, and the plots
on the right are of the test Ho: p1 = 0.1, p2 = 0.2, p3 = 0.3, p4 = 0.4. As sample size increases, the true distribution of the test statistic
approaches a X2 distribution.

If the X2 approximation is not reasonable, then the TRUE probability of a Type I error may be very difficult from the stated
probability of a Type I error. If the value in the table below is close to the stated alpha level, then the true probability of a Type I
error is close to the stated value. If the value differs, then our stated conclusions may be misleading.

Note that the true distribution of the test statistic is discrete, and we are approximating it with a continuous distribution. The nature
of approximating a discrete distribution with a continuous one can lead to unusual (and undesired) results. For example, note that
for the first test, the true probability of a Type I error is exactly the same for alpha = 0.05 as for alpha = 0.10. In the given scenario,
there are no possible values of the test statistic between 6.25 and 7.81, the critical values of the X2 distribution corresponding to
alpha = 0.10 and alpha = 0.05. Can be seen in the gap in figure 13.6a. This problem is only a major issue when the expected counts
are TOO SMALL for the approximation to be reasonable. When the sample size increases, the approximation imporves and this
effect is no longer obvious.

Table 13.6: Estimates of the true probability of a Type I error.


Sample size Ho: p1 = 0.01, p2 = 0.02, p3 = 0.03, p4 = 0.94 Ho: p1 = 0.1, p2 = 0.2, p3 = 0.3, p4 = 0.4
Stated alpha = 5% Stated alpha = 10% Stated alpha = 5% Stated alpha = 10%
n=10 14.2 14.2 4.6 9.0
n=50 5.7 7.4 4.8 9.7
n=100 5.5 8.6 4.9 9.8
n=1000 5.0 9.8 5.0 10.0

Vous aimerez peut-être aussi