Vous êtes sur la page 1sur 15

# CHI-SQUARED TEST

Course 5 Econometrics
Introduction
• Two statistical techniques are presented, to
analyze qualitative data.
– A goodness-of-fit test for the multinomial experiment.
– A contingency table test of independence.
• Both tests use the c2 as the sampling distribution
for the test statistic.
Chi-squared Goodness-of-Fit Test
• This test describes a single population of qualitative data.
• The multinomial experiment studied is an extension of the
binomial experiment.
– There are n independent trials.
– The outcome of each trial can be classified into one of
k categories, called cells.
– The probability pi of cell i remains constant for each
trial. Moreover, p1 + p2 = … +pk = 1.
• The hypothesis tested involves the values of pi.
– Two competing companies A and B have conducted
– Market shares before the campaigns were:
• Company A = 45%
• Company B = 40%
• Other competitors = 15%.
– To study the effects of the campaigns on the market
shares, 200 customers were asked to indicate their
– Survey results:
• 102 customers preferred the company A’s product,
• 82 customers preferred the company B’s product,
• 16 customers preferred the other competitors’ product.
• Solution
– The population investigated is the brand preferences.
– The data are qualitative (A, B, or other)
– This is a multinomial experiment (three categories).
– The question of interest: Are p1, p2, and p3 different
after the campaign from their values before the
campaign?
• The hypotheses are:
H0: p1 = .45, p2 = .40, p3 = .15
H1: At least one pi is not equal to its specified value

Test statistic: What sample frequency would you expect for each
category if the null hypothesis is true?
90 = 200(.45) 80 = 200(.40) What actual frequencies
did the sample return?
1
2

102
3

82
30 = 200(.15)
1
2
3

16
– The statistic is
(oi  ei )
k 2
c 
2
Conclusion: At 5% significance level
i 1 ei there is sufficient evidence to reject
the null hypothesis. At least one of the
– The rejection region is probabilities pi is different. Thus, at least
two market shares have changed.
c c 2 2
 ,k1

– In our example:
k
(102  90)2 (82  80)2 (16  30)2
c2  
i1
90

80

30
 8.18

## c2 ,k1  c.205,31  5.99147

• Rule of five
• The frequency data must have a precise
numerical value and must be organised into
categories or groups.
• For the approximation to apply, the expected cell
frequency has to be at least 5.
• If the expected frequency in a cell is less than 5,
combine it with other cells.
• The total number of observations must be
greater than 20.
Chi-squared Test of a Contingency Table
TEST OF INDEPENDENCE
• This test satisfies two objectives
– Are two qualitative variables related?
– Are there differences among two or more
populations of qualitative variables?
• To accomplish the test objectives, we need to
classify the data according to two different
criteria.
TEST OF INDEPENDENCE
 Contingency table (survey of drinkers)

## Wine Beer Vodka Total

Women 28 32 10 70
Men 20 40 20 80

##  The null hypothesis basically is that the population

percentage favoring each of the three drinks is the same
for women and men (note this does NOT imply that each type of drink has
an equal population percentage of 33.33)
TEST OF INDEPENDENCE
 The alternative hypothesis is that the population percentage
favoring each drink is not the same for women and for men
 The null hypothesis can also be expressed as follows: «
drinkers’ preference for a type of drink is independent of their
gender » and the alternative hypothesis: « there is a
relationship between the drink preference and the gender of
drinkers »
 The observed frequencies are given in the contingency table
TEST OF INDEPENDENCE
 Computation of expected frequencies
In general, the expected frequencies may be computed for
each cell eij of a contingency table by the following
formula:
(Row i total)  (Column j total)
eij 
Sample size
TEST OF INDEPENDENCE
 Computation of expected frequencies
Wine Beer Vodka Total
Women 28 32 10 70
22,40 33,60 14,00
Men 20 40 20 80
25,60 38,40 16,00
Total 48 72 30 150

## ( oij  eij ) 2 (28  22.40) 2

c 2
computed    etc...  4.9107
eij 22.40
TEST OF INDEPENDENCE

##  Make the statistical decision

df = (r - 1)(c - 1)
In our example, we have (2 - 1)(3 - 1) = 2 degrees of freedom. If
we choose = .05, then c= CHIINV(.05, 2) = 5.99.
c 2
computed c 2
.05,2

## We must therefore accept the null hypothesis H0 and we

conclude that the population percentage favoring each of the
three types of drinks is the same for women and men.
Assumptions / Limitations

##  Data is from a random sample.

 A sufficiently large sample size is required (at least 20)
 Actual count data (not percentages)
 Adequate cell sizes should be present. (>5 in all cells)
 Observations must be independent.
 Does not prove causality.