Académique Documents
Professionnel Documents
Culture Documents
13.1 Introduction
X2 tests involve only hypothesis tests, not confidence intervals, and there is (essentially) one formula for the test statistic in all of the
different situations. The calculations are straightforward but can be long and tedious, so we often use software to do the
calculations for us.
Example 13.1. Does the distribution of ABO blood type among sufferers of pancreatic cancer differ from the distribution of blood
type for the general population? A study investigated a possible relationship between blood type and pancreatic cancer by looking at
the data from a large sample of nurses in the US. The table below illustrates the breakdown of ABO blood type for a sample of 200
nurses that developed pancreatic cancer.
For the population of all nurses in the US, 36% have blood type A, 13% have blood type B, 8% have blood type AB, and 43% have
blood type O. We will assume that these values represent the true distribution of blood types for American nurses.
Blood Type A B AB O Total
Count 71 36 23 70 200
Percentage 35.5 18.0 11.5 35.0 100
We can test the null hypothesis that the distribution of ABO blood type for nurses with pancreatic cancer is the same as for the
population of nurses:
The alternative hypothesis is that the hypothesized proportions are not all correct (In other words, that the distribution of blood
type among pancreatic cancer sufferers differs from the distribution of blood type for the population). We can test the null
hypothesis with a X2 test.
In X2 tests for one way tables, we test the null hypothesis that the true proportions for c categories of a categorical variable are
equal to the certain hypothesized values:
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝜒2 = ∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
𝑎𝑙𝑙 𝑐𝑒𝑙𝑙𝑠
Observed represents the observed sample counts in the various categories. Expected represents the count we would expect to get,
on average, if the null hypothesis were true. For each category: Expected count = Hypothesized proportion x sample size
If the null hypothesis is true, the test statistic will have an approximate chi-squared distribution. For this type of test, the degrees of
freedom are the number of categories minus one. This approximation works best for large sample sizes. Assumptions in 13.4.2.
If the observed counts are close to the expected counts, then the test statistic will be very small. If very different, test statistic is very
large. The larger the value of the test statistic, the greater the evidence against the null hypothesis.
Since there are 4 squared terms, there are 4 – 1 = 3 degrees of freedom. The p-value is illustrated below.
Using software, we can find that the p-value is 0.0194 (Using a X2 tables, we can find only that the p-value falls in an interval, such as
0.01 < p-value < 0.025).
The p-value is small, indicating fairly strong evidence against the null hypothesis (small enough to be significant at the commonly
chosen significance level of 0.05).
This means that there is fairly strong evidence that the distribution of blood types for American nurses with pancreatic cancer differs
from the distribution of blood types for the entire population of American nurses. In short, there is some evidence of an association
between blood type and pancreatic cancer.
Example 13.2 In a famous genetics experiment in 1905, William Bateson and Reginald Punnett investigated inheritance in sweet
peas. Purple flowers and long grains are dominant traits, so the peas resulting from the first-generation cross all had purple flowers
and long grains. Crossing those resulting in 9:3:3:1 phenotypic ratio.
Phenotype
Purple/long Purple/round Red/long Red/round
Observed count 284 21 21 55
Expected count 9/16 x 381 = 71.4 71.4 23.8
214.3
There appears to be large differences between observed and expected counts. Let’s carry out hypothesis test, testing the
hypotheses:
The test statistic is calculated to be 134.7. The p-value is the area to the right of 134.7 under a X2 distribution with 3 degrees of
freedom. There is some non-zero area to the right of 134.7, but for all intents and purposes, this area is 0. No chance of observing
what was observed if the true ratio is 9:3:3:1
This experiment gave extremely strong evidence that the inheritance was not following Mendelian inheritance laws under
independent assortment.
In some probability and statistical inference scenarios discussed in previous chapters, we have assumed sample came from specific
probability distribution, such as the binomial, Poisson, or normal distributions. We can carry out a X 2 goodness-of-fit test to
investigate this assumption.
Example 13.3 Two seeds are planted in each of 100 pots. The number of seeds that germinate in each pot is recorded, with the
following results.
Number that germinate 0 1 2
Number of pots 16 14 70
With small p-value, we have very strong evidence number of germinating seeds do not follow a binomial distribution.
To find the expected counts under the null hypothesis, we first need the best estimate of p, the probability of success on a single
trial in a binomial setting. The estimator p is the sample proportion of germinating seeds:
Used to obtain the estimated probabilities and expected counts for each cell:
The test statistic is equal to 36.57. Since we used the data to estimate the parameter p, we lost an extra degree of freedom (one
degree of freedom is lost for every parameter estimated from the data). The appropriate degrees of freedom are 3 – 1 – 1 = 1.
The p-value is 1.5x10-9. Since the p-value is tiny, we have extremely strong evidence against the null hypothesis. We can reject the
null hypothesis at any reasonable level (and many unreasonable ones!). There is extremely strong evidence that the number of
seeds that germinate in a pot does not follow a binomial distribution.
Some conditions satisfied here – We have a fixed number of trials and we are counting the number of successes. Independence
assumption must be violated.
In two-way contingency tables the observations are classified according to two different categorical variables. These types of
problems are more common and interesting than one-way problems.
Example 13.4 Is there a relationship between fatty fish consumption and the rate of prostate cancer? Study followed 6272 Swedish
men for 30 years. They were categorized according to their fish consumption, and to whether they developed prostate cancer.
Fish Consumption
Never/seldom Small Moderate Large
Prostate cancer 14 201 209 42
No prostate cancer 110 2420 2769 507
% prostate cancer 11.3 7.7 7.0 7.7
Example 13.5 An experiment investigated the effect of two types of frontier medicine on the health of cardiac patients. All 748
cardiac patients in the study received at least the standard treatment, but some patients received additional treatments. Each
patient was randomly assigned to one of 4 groups:
Several variables were measured on each individual six months after the start of the intervention. One variable was six-month
survival (whether the patient was still alive after six months)
Intervention
Prayer MIT Prayer and MIT Neither
Dead 11 4 3 9
Alive 171 181 186 183
% Dead 6.0 2.2 1.6 4.7
Is there a significant difference in the six-month survival rates of the four groups?
In two-way tables, we often wish to test the null hypothesis that there is no relationship between the row and column variables. The
hypotheses will be phrased differently, depending on the sampling design.
If we have a single sample, and classify the individuals according to two variables (like example 13.4), our hypotheses will be:
If we have several samples from different populations, or an experiment with several different treatment groups (like example 13.5),
our hypotheses will be:
Ho: The distribution of the response variable is the same for all of the populations
Ha: The distribution of the response differs in some way between populations
We are testing the null hypothesis that the row and column variables are NOT related (there is no association between the row and
column variables).
The same test statistic will be used for all two-way tables involving count data:
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝜒2 = ∑
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
𝑎𝑙𝑙 𝑐𝑒𝑙𝑙𝑠
The observed counts are found in sample data. The expected counts are given by:
Example 13.6 Supposed we randomly sample 100 students at a university, and categorize them according to gender and whether or
not they have a car.
Based on this sample, the estimated proportion of male students at this university is 𝑝̂𝑚𝑎𝑙𝑒 = 0.40 and estimated proportion of car
owners at this university is 𝑝̂𝑐𝑎𝑟 = 0.22.
If two events are independent, the proportion of students that are male AND own a car is: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) = 𝑃(𝑐𝑎𝑟) ×
𝑃(𝑚𝑎𝑙𝑒) = 0.088. Expected count of males who own cars is (100)(0.088)= 8.8. This gives rise to the general form of expected
counts above.
Expected counts:
Fish Consumption
Never/seldom Small Moderate Large
Prostate cancer 9.21 194.74 221.26 40.79
No prostate cancer 114.79 2426.26 2756.74 508.21
The calculated test statistic is 3.6773. The degrees of freedom are: (2—1)(4—1)=3. The p-value is the area to the right of 3.677 which
is approximately 0.30. Since this p-value is quite large, there is no evidence against the null hypothesis of independence. We have no
evidence that fish consumption is related to the rate of prostate cancer in Swedish men.
Ho: The six-month survival rate is the same for all treatment groups
Ha: The six-month survival rates are not all equal
Alternatively:
Ho: p1 = p2 = p3 = p4
Ha: These probabilities are not all equal
Where pi is the true six-month survival rate for the ith treatment type.
Output from R:
The evidence is not very strong and is not significant at the commonly chosen significance level of 0.05.
Example 13.7 Is there a relationship between education level and smoking status among French men? A study measured several
variables on 459 healthy men in France who were attending a clinic for a checkup.
When they both have more than two levels, visualizing the relationship can be difficult, and is helpful to plot the graphs above.
These plots do not show any dramatic differences in the distributions but it appears as though those with a university education may
be more likely to be non-smokers and less likely to be heavy smokers.
Output from R:
There is moderately strong evidence of an association between education level and smoking status in French men (moderately
strong evidence that these variables are not independent). Significant evidence against the null hypothesis means there is strong
evidence of an association between the row and column variables but the nature of the relationship can be difficult to determine
from the table of observations.
By looking at figure above, we can see that the most obvious differences are that those with a university education are more likely to
be non-smokers. We could investigate this in greater detail with other statistical inference techniques.
There is a direct relationship between the X2 test applied to a 2x2 table and the Z test for proportions discussed in Section 11.5. Too
illustrate, let’s return to Example 11.4, in which an experiment investigated whether a vitamin C supplement helps to reduce the
incidence of colds.
To test the null hypothesis that the population proportions are equal (vitamin C has no effect), we can use either the Z test or the X2
test. When we first encountered this problem, we calculated the Z statistic:
𝑝̂1 − 𝑝̂2
𝑍= = 2.517
𝑆𝐸0 (𝑝̂1 − 𝑝̂2 )
The resulting p-value is 0.0118. But we can also carry out an X2 test on the same data:
Note that the Z2 = X2 (2.5172 = 6.336), and the p-values are equal. This is an exact relationship, For 2x2 tables, the Z test for
proportions and the X2 test are equivalent tests.
13.4.2 Assumptions
The test is appropriate only if each individual appears in one and only cell. The following example violates this assumption:
Example 13.8 At a resort, 200 guest were asked which amenities they used during their stay. Individuals were able to use more than
one of the amenities, and many individuals appear in more than one cell (the total of all cells is 374, so on average each one of the
200 people used 1.87 amenities). The X 2 test would not be appropriate.
The test doesn’t perform well if the expected counts are very small. Statisticians have said they should not be used if expected count
is less than 5. Rough guideline: The test is reasonable if the average expected count is at least 5, there are not any expected
counts that are very small (less than one, say) and there are not too many less than 5.
The figure below shows histograms of 1,000,000 simulated values of the X2 test statistic. The values are simulated under the
assumption that the null hypothesis is true. Plots on the left are of the test H o: p1 = 0.01, p2 = 0.02, p3 = 0.03, p4 = 0.94, and the plots
on the right are of the test Ho: p1 = 0.1, p2 = 0.2, p3 = 0.3, p4 = 0.4. As sample size increases, the true distribution of the test statistic
approaches a X2 distribution.
If the X2 approximation is not reasonable, then the TRUE probability of a Type I error may be very difficult from the stated
probability of a Type I error. If the value in the table below is close to the stated alpha level, then the true probability of a Type I
error is close to the stated value. If the value differs, then our stated conclusions may be misleading.
Note that the true distribution of the test statistic is discrete, and we are approximating it with a continuous distribution. The nature
of approximating a discrete distribution with a continuous one can lead to unusual (and undesired) results. For example, note that
for the first test, the true probability of a Type I error is exactly the same for alpha = 0.05 as for alpha = 0.10. In the given scenario,
there are no possible values of the test statistic between 6.25 and 7.81, the critical values of the X2 distribution corresponding to
alpha = 0.10 and alpha = 0.05. Can be seen in the gap in figure 13.6a. This problem is only a major issue when the expected counts
are TOO SMALL for the approximation to be reasonable. When the sample size increases, the approximation imporves and this
effect is no longer obvious.