Vous êtes sur la page 1sur 13

FISHER’S EXACT TEST FOR CATEGORICAL DATA

ANALYSIS

EDON M. ONTO
Graduate School Department, Notre Dame of Marbel University
9506 Alunan Ave., Koronadal City, South Cotabato
edon.onto@gmail.com

ABSTRACT

Suppose you want to test whether multiple samples of categorical variable are came from the same

distribution. Parametric test of independence for categorical variables such as chi-square test is great if

sample is large but if it is not you may want to use Fisher’s exact test. The Fisher’s exact test of

independence for categorical variables is very effective and a great compliment for chi-square if sample

size is small. This paper will provide students or researchers a great test for independence of categorical

data especially when the sample is small.

INTRODUCTION

Objective of the Study

This paper aims to provide the readers the necessary information on where, when and how to use

of Fisher’s exact test.

This paper also aims to provide a reference manual for implementation Fisher’s exact test in

categorical data analysis whether manually or by computer aid.


Significance of the Study

In inferential studies, the ideal is to have large sample because it tends to represent the population.

On the other hand, in study where the sample small is less reliable because it may not represent the

population. But despite from this, we always encounter studies with small sample and it seems that it is

very useful in practical ways. Thus, studies with small sample should always be consider in many inferential

studies. This is where test statistics like fisher’s exact test can be very useful. This paper emphasizes how

to use and how to implement Fisher’s exact test for the analysis of categorical data for future use of the

readers.

Scope and limitation

This paper will show the important and the use of Fisher’s exact test on categorical data analysis

for 2x2 contingency table, although it is applicable for mxn contingency table but the details of that will not

be emphasize in this paper.

Examples will be given for illustration and simulation of the topic. Most of the example is an

existing data from previous studies.

Framework

Figure 1
METHODOLOGY

Fisher’s Exact Test

Fisher’s exact test was named after its inventor, Ronald Fisher. It is exact because it compute the

significance of deviation from a null hypothesis rather than relying on approximation. Although it is

employed if sample sizes is small, but it is applicable to all sample sizes.

When to use it

Use Fisher’s exact test when there is at least one of the expected values in the 2x2 contingency

table is less than 5, other than that it is always better to use other test of independence such as chi-square

test. It is use to test for independence of two nominal or categorical variables and you want to see whether

the proportions of one variable is different depending on the value of the other variable. For example, the

data in table 1 is comparing intramuscular magnesium injections with placebo for the treatment of chronic

fatigue syndrome[1]. Of the 9 patients who had the intra-muscular magnesium injections 8 felt better (89%)

whereas, of the 12 on placebo, only two felt better (17%), which seems promising. Fisher’s exact test will

tell whether this difference between 89% and 17% is statistically significant.

Table 1
Magnesium Placebo Total
Felt better 8(4) 2(6) 10
Did not feel
1(5) 10(6) 11
better
Total 9 12 21

For this example, Fisher’s exact test is more reliable than chi-square test due to the small sample size and

there is expected value in the data that is less than 5.


Null Hypothesis

Since Fisher’s exact test is use for contingency table, thus, it test for the relationship of categorical

data. The null hypothesis is that one variable are independent of the second variable; in other words, the

proportions at one variable are the same for different values of the second variable[2]. In the chronic fatigue

syndrome example, the null hypothesis is that the chance of getting better from the chronic fatigue

syndrome is the same as whether the subject receive magnesium injections treatment or placebo. In other

words, getting better from chronic fatigue syndrome is independent from receiving magnesium injections

or placebo treatment.

Hyper-Geometric Distribution

Computation

Let us put some notations on the chronic fatigue syndrome example. We represent the cells with a,

b, c, and d and represent the grand total by n.

Table 2
Magnesium Placebo Total
Felt better a b a+b
Did not feel
c d c+d
better
Total a+c b+d a+b+c+d=n

Fisher’s exact test follows a hyper-geometric distribution, that is

(𝑎+𝑏 𝑐+𝑑 𝑎+𝑏 𝑐+𝑑


𝑎 )( 𝑐 ) ( 𝑏 )( 𝑑 )
p= 𝑛 = 𝑛
(𝑎+𝑐 ) (𝑏+𝑑 )

and that is equivalent to

(𝑎+𝑏)! (𝑐+𝑑)! (𝑎+𝑐)! (𝑏+𝑑)!


p=
𝑎! 𝑏! 𝑐! 𝑑! 𝑛!
𝑛
Where (𝑘 ) is a binomial coefficient and the symbol ! is a factorial operation[3]. For the chronic syndrome

example, this gives:

9 12
(8) ( 2 ) (10!)(11!)(9!)(12!)
p = 21 = = 0.0016841
(10) (8!)(2!)(1!)(10!)(21!)

The above formula gives us the exact probability of observing this arrangement of data, with the

assumption that the given marginal totals, on the null hypothesis that the magnesium injection and placebo

are equally effective in treating chronic fatigue syndrome. In other words, if we let p be the probability of

subject being felt better with magnesium injection, then p is also the probability that the subject is being

felt better with placebo and if we let the subjects enter our sample independently whether they felt better or

not, the hyper-geometric formula gives the conditional probability of observing the values a, b, c, d in the

four cells, conditionally on the observed marginal.

In the chronic fatigue syndrome, the probability of the observed data is 0.0016841 but according to

Fisher, in order to calculate the significance of the observed data if the null hypothesis is true, we have also

to consider the more extreme data with the same marginal and same direction with the null hypothesis, that

is we have to add the probability of observed data and the more extreme data. For the chronic fatigue

syndrome example, there is only one more extreme data, that is, all of the subjects in magnesium injection

treatment felt better:

Table 3
Magnesium Placebo Total
Felt better 9 1 10
Did not feel
0 11 11
better
Total 9 12 21

with p-value = 0.00003402. The significance of the observed data for our chronic fatigue syndrome is

0.0016841 + 0.00003402 = 0.00171812 which gives us one-tailed test. In statistical software like R, this

value can be obtained as fisher.test(rbind(c(8,1),c(2,10)), alternative="greater")$p.value


[4]. This value can be interpreted as the sum of evidence provided by the observed data or any more extreme

data for the null hypothesis (that there is no difference in the proportions of subjects that felt better between

magnesium injection and placebo treatment). The smaller the value of p, the greater the evidence for

rejecting the null hypothesis; so here the evidence is strong that subjects treated with magnesium injection

and treated with placebo are not equally likely to be feel better in chronic fatigue syndrome.

To calculate for the two-sided test, we have to consider all the p-value that is equal or less than the

p-value of the observed data with the same marginal with the observed data. In R, the p-value for two-sided

test can be obtained as exact2x2(rbind(c(8,1),c(2,10)), tsmethod ="minlike")$p.value. We

always use two-sided test rather than one-sided unless you have good reason.

RESULTS

In this chapter, we will utilize the use of fisher’s exact test in categorical data analysis. We will

provide some categorical data examples and use this as simulation on how fisher’s exact test work.

Example 1

Consider the Fisher's exact test with the "famous" tea tasting example! In a summer tea-part in

Cambridge, England, a lady claimed to be able to discern, by taste alone, whether a cup of tea with milk

had the tea poured first or the milk poured first. An experiment was performed by Sir R.A. Fisher himself,

then and there, to see if her claim is valid. Eight cups of tea are prepared and presented to her in random

order. Four had the milk poured first, and four had the tea poured first. The lady tasted each one and rendered

her opinion[5]. The results are summarized in a 2 × 2 table:


Table 4
Lady says
Poured First Tea First Milk First Total
Tea 3 1 4
Milk 1 3 4
Total 4 4 8

The row totals are fixed by the experimenter. The column totals are fixed by the lady, who knows that four

of the cups are "tea first" and four are "milk first."

The null hypothesis is that the lady has no discerning ability, i.e. the four cups she calls “tea first”

are random sample from eight.

To calculate the significance of the observed data, if null hypothesis is true, we need to look for all

possible values the data can take given the same marginal and out of that values we will only consider the

as extreme or more extreme than the observed data. In this example the data can take five possible values

given the same marginal see figure 2.


Figure 2

(i)
4 0
0 4

(ii)
3 1
1 3

(iii)
2 2
2 2

(iv)
1 3
3 1

(v)
0 4
4 0

The probability that the observe data (ii) would occur is given by the hyper-geometric distribution:
4 4
(3) (1) (4!)(4!)(4!)(4!)
p = 8 = = 0.228571429
(4) (3!)(1!)(1!)(3!)(8!)

There is only one more extreme than the observe data and that is if the lady selects all the cups that
are truly “tea first” (i), which has probability

4 4
( )( ) (4!)(4!)(4!)(4!)
p= 480 = = 0.014285714
(4) (4!)(0!)(0!)(4!)(8!)

The p-value for the observed data is the sum of the as extreme and more extreme, that is 0.242857.
The fisher’s exact test yield a one-tailed p-value = 0.242857, which is a weak evidence

against the null hypothesis. In other words, we have no sufficient evidence to reject the null

hypothesis that the lady has discerning ability.

Since the null hypothesis has rejected using one-tailed test, it unnecessary for two-tailed

test.

In R package, the fisher’s exact p-value can be obtain as by fisher.exact() function

## one-tailed fisher's exact test

fisher.test(rbind(c(3,1), c(1,3)), alternative="greater")$p.value


[1] 0.2428571

## two-tailed fisher's exact test


## for demonstration purpose only

fisher.test(rbind(c(3,1), c(1,3)))$p.value
[1] 0.4857143

The fisher.exact() function in R yield a p-value = 0.2428571, which is the same with the manual calculation.

The fisher.exact() function in R is applicable even for mxn matrices.

Example 2

A sample of teenagers might be divided into male and female on the one hand, and those that are

studying in statistics exam and those that are not studying. We hypothesize, that the proportion of studying

individuals is lower among the men than among the women, and we want to test whether any difference of

proportions that we observe is significant. The data look like this:


Table 5
Male Female Total
Studying 1 9 10
Not Studying 11 3 14
Total 12 12 24

The observed data have values that are too small that makes fisher’s exact test as an appropriate

test of independence for this categorical data. To test the independence of the data, that is, the male students

are less studier than female students, we consider all the possible cases where the marginal totals are the

same as in observed data. In this example we have 11 of that cases and the observed data corresponds to

(ii) in figure 3.

Figure 3
(i) (vii)
0 10 6 4
12 2 6 8

(ii) (viii)
1 9 7 3
11 3 5 9
(iii) (ix)
2 8 8 2
10 4 4 10

(iv) (x)
3 7 9 1
9 5 3 11

(v) (xi)
4 6 10 0
8 6 2 12

(vi)
5 5
7 7
Table 6
Total a b c d P-value
i 0 10 12 2 3.36519E-05
ii 1 9 11 3 0.001346076
iii 2 8 10 4 0.016657693
iv 3 7 9 5 0.088841028
v 4 6 8 6 0.2332077
vi 5 5 7 7 0.319827702
vii 6 4 6 8 0.2332077
viii 7 3 5 9 0.088841028
iv 8 2 4 10 0.016657693
x 9 1 3 11 0.001346076
xi 10 0 2 12 3.36519E-05

To finally calculate the significance of the observed data if the null hypothesis is true, we will only

consider values that is as extreme as or more extreme than the observed data and with the same direction

with the observed data. In this example, we have (i) and (ii) with p-values: 0.0000336519 and 0.001346076

respectively, see table 6. Thus, the significance of the observed data is 0.0000336519 + 0.001346076 =

0.001379728 that gives us one-tailed test and it is highly significant to reject the null hypothesis that the

proportion of studying men is the same with the proportion of studying women.

In R package, this can be obtained by fisher.test()

## one-tailed fisher's exact test

fisher.test(rbind(c(1, 11), c(9, 3)), alternative="less")$p.value


[1] 0.001379728

For two-tailed test for this example, we will consider all values in figure 3 that are as extreme or

more extreme than the observed data, that is, values with p-value less than or equal to observed data p-
value. In this example, we have (i), (ii), (x), and (xi) with p-values: 0.0000336519, 0.001346076,

0.001346076 and 0.0000336519 respectively. So with two-tailed test we have 0.0000336519 +

0.001346076 + 0.0000336519 + 0.001346076 = 0.002759456 significance of the observed value if the null

hypothesis is true which is highly significant. Thus, we reject the null hypothesis that the proportion of

studying men and studying women are equal. In other words, the proportion of studying men is significantly

low compare to the proportion of the studying women.

In R package, the two-tailed Fisher’s exact test can be obtain by the exact2x2() function

## two-tailed fisher's exact test

exact2x2(rbind(c(1,11),c(9,3)), tsmethod ="minlike")$p.value


[1] 0.002759456

CONCLUSION
We have shown that Fisher’s exact test is very useful in categorical data analysis especially when

sample size is small where parametric test such as chi-square is not feasible. The probability that the

observed data will happen follows the hyper-geometric distribution. The calculation of the p-value for the

Fisher’s exact test is easy and fast by the use computer software such as R. In the “lady tea testing

experiment,” we have shown using exact test that there is no enough evidence that the lady has discerning

ability, on other hand, in the “men and women study habit,” we have shown that the proportion of studying

women is greater than the proportion of studying men.

ACKNOWLEDGEMENT

I would like to express my deepest and sincere thanks to my parents, family, and friends for the

unconditional support and motivation to finish this paper. To my very understanding and supportive

research adviser, Maam Danielle Vne A. Dondoyano, classmates, and MIM students. To most especially to

Almighty God for the knowledge, strength and health to finish this paper.
REFERENCES

[1] Cox I.M., Campbell M.J, Dowson, D (1991). Red Blood cell magnesium and chronic fatigue syndrome.

Lancet, 337, 757-760.

[2] Agresti, A. (2002). “Categorical Data Analysis”. Florida: Jon Wiley and Sons, Inc.

[3] Fisher, R. A. (1992). “On the interpretation of 𝜒2 from contingency tables, and the calculation of P”.

Journal of Royal Statistical Society. 85(1): 87-94.

[4] R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. R

Foundation for Statistical Computing Vienna, Austria. ISBN 3-900051-07-0.

[5] Fisher, R. A. (1956). “Mathematics of a lady Tasting Tea”. Courier Dover Publications. ISBN 978-0-

486-4115-4.

Vous aimerez peut-être aussi