Académique Documents
Professionnel Documents
Culture Documents
ANALYSIS
EDON M. ONTO
Graduate School Department, Notre Dame of Marbel University
9506 Alunan Ave., Koronadal City, South Cotabato
edon.onto@gmail.com
ABSTRACT
Suppose you want to test whether multiple samples of categorical variable are came from the same
distribution. Parametric test of independence for categorical variables such as chi-square test is great if
sample is large but if it is not you may want to use Fisher’s exact test. The Fisher’s exact test of
independence for categorical variables is very effective and a great compliment for chi-square if sample
size is small. This paper will provide students or researchers a great test for independence of categorical
INTRODUCTION
This paper aims to provide the readers the necessary information on where, when and how to use
This paper also aims to provide a reference manual for implementation Fisher’s exact test in
In inferential studies, the ideal is to have large sample because it tends to represent the population.
On the other hand, in study where the sample small is less reliable because it may not represent the
population. But despite from this, we always encounter studies with small sample and it seems that it is
very useful in practical ways. Thus, studies with small sample should always be consider in many inferential
studies. This is where test statistics like fisher’s exact test can be very useful. This paper emphasizes how
to use and how to implement Fisher’s exact test for the analysis of categorical data for future use of the
readers.
This paper will show the important and the use of Fisher’s exact test on categorical data analysis
for 2x2 contingency table, although it is applicable for mxn contingency table but the details of that will not
Examples will be given for illustration and simulation of the topic. Most of the example is an
Framework
Figure 1
METHODOLOGY
Fisher’s exact test was named after its inventor, Ronald Fisher. It is exact because it compute the
significance of deviation from a null hypothesis rather than relying on approximation. Although it is
When to use it
Use Fisher’s exact test when there is at least one of the expected values in the 2x2 contingency
table is less than 5, other than that it is always better to use other test of independence such as chi-square
test. It is use to test for independence of two nominal or categorical variables and you want to see whether
the proportions of one variable is different depending on the value of the other variable. For example, the
data in table 1 is comparing intramuscular magnesium injections with placebo for the treatment of chronic
fatigue syndrome[1]. Of the 9 patients who had the intra-muscular magnesium injections 8 felt better (89%)
whereas, of the 12 on placebo, only two felt better (17%), which seems promising. Fisher’s exact test will
tell whether this difference between 89% and 17% is statistically significant.
Table 1
Magnesium Placebo Total
Felt better 8(4) 2(6) 10
Did not feel
1(5) 10(6) 11
better
Total 9 12 21
For this example, Fisher’s exact test is more reliable than chi-square test due to the small sample size and
Since Fisher’s exact test is use for contingency table, thus, it test for the relationship of categorical
data. The null hypothesis is that one variable are independent of the second variable; in other words, the
proportions at one variable are the same for different values of the second variable[2]. In the chronic fatigue
syndrome example, the null hypothesis is that the chance of getting better from the chronic fatigue
syndrome is the same as whether the subject receive magnesium injections treatment or placebo. In other
words, getting better from chronic fatigue syndrome is independent from receiving magnesium injections
or placebo treatment.
Hyper-Geometric Distribution
Computation
Let us put some notations on the chronic fatigue syndrome example. We represent the cells with a,
Table 2
Magnesium Placebo Total
Felt better a b a+b
Did not feel
c d c+d
better
Total a+c b+d a+b+c+d=n
9 12
(8) ( 2 ) (10!)(11!)(9!)(12!)
p = 21 = = 0.0016841
(10) (8!)(2!)(1!)(10!)(21!)
The above formula gives us the exact probability of observing this arrangement of data, with the
assumption that the given marginal totals, on the null hypothesis that the magnesium injection and placebo
are equally effective in treating chronic fatigue syndrome. In other words, if we let p be the probability of
subject being felt better with magnesium injection, then p is also the probability that the subject is being
felt better with placebo and if we let the subjects enter our sample independently whether they felt better or
not, the hyper-geometric formula gives the conditional probability of observing the values a, b, c, d in the
In the chronic fatigue syndrome, the probability of the observed data is 0.0016841 but according to
Fisher, in order to calculate the significance of the observed data if the null hypothesis is true, we have also
to consider the more extreme data with the same marginal and same direction with the null hypothesis, that
is we have to add the probability of observed data and the more extreme data. For the chronic fatigue
syndrome example, there is only one more extreme data, that is, all of the subjects in magnesium injection
Table 3
Magnesium Placebo Total
Felt better 9 1 10
Did not feel
0 11 11
better
Total 9 12 21
with p-value = 0.00003402. The significance of the observed data for our chronic fatigue syndrome is
0.0016841 + 0.00003402 = 0.00171812 which gives us one-tailed test. In statistical software like R, this
data for the null hypothesis (that there is no difference in the proportions of subjects that felt better between
magnesium injection and placebo treatment). The smaller the value of p, the greater the evidence for
rejecting the null hypothesis; so here the evidence is strong that subjects treated with magnesium injection
and treated with placebo are not equally likely to be feel better in chronic fatigue syndrome.
To calculate for the two-sided test, we have to consider all the p-value that is equal or less than the
p-value of the observed data with the same marginal with the observed data. In R, the p-value for two-sided
always use two-sided test rather than one-sided unless you have good reason.
RESULTS
In this chapter, we will utilize the use of fisher’s exact test in categorical data analysis. We will
provide some categorical data examples and use this as simulation on how fisher’s exact test work.
Example 1
Consider the Fisher's exact test with the "famous" tea tasting example! In a summer tea-part in
Cambridge, England, a lady claimed to be able to discern, by taste alone, whether a cup of tea with milk
had the tea poured first or the milk poured first. An experiment was performed by Sir R.A. Fisher himself,
then and there, to see if her claim is valid. Eight cups of tea are prepared and presented to her in random
order. Four had the milk poured first, and four had the tea poured first. The lady tasted each one and rendered
The row totals are fixed by the experimenter. The column totals are fixed by the lady, who knows that four
of the cups are "tea first" and four are "milk first."
The null hypothesis is that the lady has no discerning ability, i.e. the four cups she calls “tea first”
To calculate the significance of the observed data, if null hypothesis is true, we need to look for all
possible values the data can take given the same marginal and out of that values we will only consider the
as extreme or more extreme than the observed data. In this example the data can take five possible values
(i)
4 0
0 4
(ii)
3 1
1 3
(iii)
2 2
2 2
(iv)
1 3
3 1
(v)
0 4
4 0
The probability that the observe data (ii) would occur is given by the hyper-geometric distribution:
4 4
(3) (1) (4!)(4!)(4!)(4!)
p = 8 = = 0.228571429
(4) (3!)(1!)(1!)(3!)(8!)
There is only one more extreme than the observe data and that is if the lady selects all the cups that
are truly “tea first” (i), which has probability
4 4
( )( ) (4!)(4!)(4!)(4!)
p= 480 = = 0.014285714
(4) (4!)(0!)(0!)(4!)(8!)
The p-value for the observed data is the sum of the as extreme and more extreme, that is 0.242857.
The fisher’s exact test yield a one-tailed p-value = 0.242857, which is a weak evidence
against the null hypothesis. In other words, we have no sufficient evidence to reject the null
Since the null hypothesis has rejected using one-tailed test, it unnecessary for two-tailed
test.
fisher.test(rbind(c(3,1), c(1,3)))$p.value
[1] 0.4857143
The fisher.exact() function in R yield a p-value = 0.2428571, which is the same with the manual calculation.
Example 2
A sample of teenagers might be divided into male and female on the one hand, and those that are
studying in statistics exam and those that are not studying. We hypothesize, that the proportion of studying
individuals is lower among the men than among the women, and we want to test whether any difference of
The observed data have values that are too small that makes fisher’s exact test as an appropriate
test of independence for this categorical data. To test the independence of the data, that is, the male students
are less studier than female students, we consider all the possible cases where the marginal totals are the
same as in observed data. In this example we have 11 of that cases and the observed data corresponds to
(ii) in figure 3.
Figure 3
(i) (vii)
0 10 6 4
12 2 6 8
(ii) (viii)
1 9 7 3
11 3 5 9
(iii) (ix)
2 8 8 2
10 4 4 10
(iv) (x)
3 7 9 1
9 5 3 11
(v) (xi)
4 6 10 0
8 6 2 12
(vi)
5 5
7 7
Table 6
Total a b c d P-value
i 0 10 12 2 3.36519E-05
ii 1 9 11 3 0.001346076
iii 2 8 10 4 0.016657693
iv 3 7 9 5 0.088841028
v 4 6 8 6 0.2332077
vi 5 5 7 7 0.319827702
vii 6 4 6 8 0.2332077
viii 7 3 5 9 0.088841028
iv 8 2 4 10 0.016657693
x 9 1 3 11 0.001346076
xi 10 0 2 12 3.36519E-05
To finally calculate the significance of the observed data if the null hypothesis is true, we will only
consider values that is as extreme as or more extreme than the observed data and with the same direction
with the observed data. In this example, we have (i) and (ii) with p-values: 0.0000336519 and 0.001346076
respectively, see table 6. Thus, the significance of the observed data is 0.0000336519 + 0.001346076 =
0.001379728 that gives us one-tailed test and it is highly significant to reject the null hypothesis that the
proportion of studying men is the same with the proportion of studying women.
For two-tailed test for this example, we will consider all values in figure 3 that are as extreme or
more extreme than the observed data, that is, values with p-value less than or equal to observed data p-
value. In this example, we have (i), (ii), (x), and (xi) with p-values: 0.0000336519, 0.001346076,
0.001346076 + 0.0000336519 + 0.001346076 = 0.002759456 significance of the observed value if the null
hypothesis is true which is highly significant. Thus, we reject the null hypothesis that the proportion of
studying men and studying women are equal. In other words, the proportion of studying men is significantly
In R package, the two-tailed Fisher’s exact test can be obtain by the exact2x2() function
CONCLUSION
We have shown that Fisher’s exact test is very useful in categorical data analysis especially when
sample size is small where parametric test such as chi-square is not feasible. The probability that the
observed data will happen follows the hyper-geometric distribution. The calculation of the p-value for the
Fisher’s exact test is easy and fast by the use computer software such as R. In the “lady tea testing
experiment,” we have shown using exact test that there is no enough evidence that the lady has discerning
ability, on other hand, in the “men and women study habit,” we have shown that the proportion of studying
ACKNOWLEDGEMENT
I would like to express my deepest and sincere thanks to my parents, family, and friends for the
unconditional support and motivation to finish this paper. To my very understanding and supportive
research adviser, Maam Danielle Vne A. Dondoyano, classmates, and MIM students. To most especially to
Almighty God for the knowledge, strength and health to finish this paper.
REFERENCES
[1] Cox I.M., Campbell M.J, Dowson, D (1991). Red Blood cell magnesium and chronic fatigue syndrome.
[2] Agresti, A. (2002). “Categorical Data Analysis”. Florida: Jon Wiley and Sons, Inc.
[3] Fisher, R. A. (1992). “On the interpretation of 𝜒2 from contingency tables, and the calculation of P”.
[4] R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. R
[5] Fisher, R. A. (1956). “Mathematics of a lady Tasting Tea”. Courier Dover Publications. ISBN 978-0-
486-4115-4.