Vous êtes sur la page 1sur 24

Mathematlcs Term

STPM Chapter

Chisquaredrests

6.1

The Chi-squored Distribution

Hypothesis test discussed in the last chapter each involves a null hypothesis stated in terms of a population parameter and a test statistic having a known probability distribution. They are called parametric tests. However, not all ideas can be stated in terms of population parameters. In this chapter, we shall discuss non-parametric test called chi-squared test which is performed using the chi-squared distribution. Let xt, x2, ...,
a

x,be

a random sample from a normal distribution with mean 1t andvariance

d.

Then the sampling distribution of the statistic

^,2

i=l

Le.-o)'
C
degrees

is called the chi-squared distribution with n

givenby

of freedom. The probability density function


2

is

f(X',) =

c(X',)'

_xi r , 'e

where c is a constant, Xl ls the chi-squared statistic with v degrees offreedom and e is the base ofthe natural logarithm. c is a normalised factor so that the area under the chi-squared curve is equal to one.
Examples of chi-squared distributions with various degrees of freedom are shown in the figure below. The curve for degrees of freedom, y = n - 1 = 3 - I = 2, represents the distribution of chi-square values computed

from all possible samples of size 3. Likewise, the curve for degrees of freedom equal to 10 corresponds to the distribution for samples of size 11.

il l295

l*Nl

*"ah"-"tics

Term

STPM chapter 6 chi-squared

Tests

The chi-squared distribution has the following properties:

. . . . . . . .

The values of X2 cannot be negative

The curve is not symmetric They are all positively skewed

As v gets larger, the degree of skewness decreases


The mean of the distribution is equal to the number of degrees of freedom: p = v.
The variance is equal to two times the number of degrees of freedom: 02 = 2

When the degrees of freedom are greater than or equal to 2, the maximum value occurs when

xl,=, -

As the degrees of freedom increase, the chi-squared curve approaches a normal distribution.

The area under the curve between 0 and a particular chi-squared value is a cumulative probability associated

with that chi-squared value. For example, the figure below is a graph of the chi-squared distribution with 6 degrees of freedom, the shaded area represents a cumulative probability associated with a chi-squared statistic equal to x; that is, it is the probability that the value of a chi-squared statistic will fall between
0 and x.

B J
I

The X2-distribution table gives values of X' for various values of a and v, where a and v represent significance level and degrees of freedom respectively. The areas, c, are the column headings; the degrees of freedom, v, are given in the left column, and the table entries are the X2 values. Hence the X2 value with 6 degrees of freedom, leaving an area of 0.05 to the left, is Xi = 1.635. Owing to lack of symmetry, we must also use the table to find X'u = 12.592 for q, = 0.95.

296

Mathematics Term 3

Critical values for the X2-distribution


If X has a X2-distribution with u degrees of freedom, then for each pair of values of p and v, the tabulated value of x is such that

P(X< x)=P.

N
0.99 0.995

STPM Chapter 6 Chisquared fests

P
v

0.01 0.031571

0.025
0.039821

0.0s

0.9 2.706

0.95 3.841

0.975 5.024

0.999
10.83

=l
2 3

o.0\932
0.t026
0.3518

6.635

7.879
10.60 12.84 14.86 16.75 18.55 20.28

0.02010
0.1 148

0.05064

4.60s
6.251
7.779 9.236

5.991 7.815
9.488

7.378 9.348
I 1.14

9.2t0
r1.34

t3.82 r6.27
18.47
20.51

0.21s8 0.4844
0.8312 1.237
1.690
2.1 80

4
5

0.2971

0.7t07
1.145

t3.28
15.09
16.81

0.5543

tl.07
t2.59
14.07
15.51 16.92 18.31 19.68

t2.83
14.45
16.01 17.53 19.02

6
7 8

0.872r
1.239

1.63s
2.167 2.733 3.32s

t0.64 t2.02 t3.36


14.68

22.46 24.32

18.48 20.09 21.67

t.647
2.088 2.558 3.053

2t.95
23.s9

26.r2
27.88 29.59 31.26
32.91

9 10
11

2.700
3.247 3.816

3.940
4.575 5.226

t5.99
17.28 18.55 19.81

20.48 21.92 23.34 24.74

23.2r
24.73 26.22 27.69 29.14 30.58 32.00

25.t9
26.76 28.30 29.82

t2

3.571 4.107 4.660


5.229

4.404
5.009 s.629 6.262 6.908

21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14

l3 t4
15

5.892

34.53

6.57t
7.26r
7.962 8.672

2t.06
22.3r 23.s4
24.77 25.99 27.20 28.41 29.62 30.81

26.r2
27.49 28.85
30.1 9

3t.32
32.80 34.27 35.72

36.t2
37.70 39.25
40.79

l6 t7
18

s.8t2
6.408

7.564 8.231
8.907

33.4r
34.81

7.0rs
7.633

9.390
10.12 10.85 I 1.59 12.34
13.09

31.53 32.85

37.t6
38.58 40.00 41.40 42.80

42.31
43.82 45.31

t9
20

36.t9
37.57 38.93 40.29

8.260
8.897 9.542 10.20 10.86

9.59r
10.28 10.98

3t.41
32.67 33.92

34.t7
35.48 36.78 38.08 39.36 40.6s 41.92 43.19 44.46 45.72 46.98

2t
22
23

46.80 48.27
49.73
5

tL.69 t2.40
13.r2
13.84 14.57
15.31

32.0t
33.20 34.38 35.56 36.74 37.92 39.09 40.26

35.t7
36.42 37.65 38.89
40.1

4t.64
42.98

44.r8
45.56
46.93 48.29

24 25 26 27 28 29 30

13.8s
14.61

1.18

lt.52
12.20 12.88 13.56

44.3r
45.64 46.96 48.28 49.s9 50.89

52.62 54.05 s5.48

1s.38
16.15 16.93 17.71

49.65 50.99 52.34 53.67

41.34 42.56 43.77

s6.89
58.30 59.70

t4.26
14.9s

16.0s

il l-

t6.79

t8.49

297

lNl t"ah.*"tics Term 3

STPM chapter 6 chi-squared Tests

Example

'1

The curve of the chi-squared distribution with v = 3 degrees of freedom is shown below. Find the critical value of X2 such that the area in the shaded region is
0.025.

Solution

Look it up in the table by proceeding down the left column entitled v, degrees of freedom, to v = 3. Then move to the right till the column labelled 0.975 is
found. The result is 9.348. Thus we have P(x'

>

9.348) = 9.925.

Example 2

A factory has produced a particular type ofdrill. On average, the useful operating live is 5.5 hours. The standard deviation is 0.47 hour. The quality control department runs a test by randomly selecting six drills. The standard deviation
of the selected drill is 0.61 hour. Determine the chi-squared statistic represented

by this test.

$olation

Given o = 0.47 hour, s = 0.61 hour, and the number of sample observations

n = 6. the chi-squared statistic


n,z
nS2

is

x=- d _ 6(0.61'?)
0.472

10.107

E;ge1,eiSe_-Cl,_=

G J

l.
2.

Find the

95th

percentile of the chi-squared distribution with 9 degrees of freedom.

Using the table of chi-squared distribution table, find

(a) (b) (c)

< 18.4s), P(X1, > 1e.81), P(X'r, ) 32.67).


P(x:,

298

Mathematics Term
3.

STPM Chapter 6 Chi-squared r"sts

Giving v and q, find the critical value(s) for each


(a)
a--

case

(b)

(c)

4.

Using the chi-squared distribution table, find the value of k such that

(a) (b) (c)


5.

< k) = 0.0t P(x1, > k) = o.es P(k < x2s < 9.39) = o.o4
P(X1,

(a) (b)

Find the mean and the standard deviation of a chi-squared distribution with 8 degrees of freedom. Which one of the following chi-squared distributions looks the most like a normal distribution? (i) A chi-squared distribution with I degree of freedom (ii) A chi-squared distribution with 2 degrees of freedom (iii) A chi-squared distribution with 5 degrees of freedom (iv) A chi-squared distribution with 10 degrees of freedom

6.

A random sample of 30 observations from a normal population with variance d = 8.3, is found to have a sample variance s2 = LL.72. Determine the chi-squared statistic from this experiment,

The chi-squared test can be used to test how good a fit between observed frequencies and expected frequencies.

Observed frequencies are the actual frequencies observed from a random sample. Expected frequencies are theoretical frequencies based on a distribution under the null hyprothesis which is presumed to be true until statistical evidence indicates otherwise. As an example: what would we expect by flipping a coin 12 times? By chance, we observe six heads and six tails. If we observe one head and eleven tails in this experiment, would this outcome be attributable merely to chance or be it due to the coin being biased? The chi-squared test can help providing an answer. Before discussing the chi-squared test, we have several assumptions to make. First, frequency data is used

il g-

to represent the actual number of elements in each category. Second, categories are mutually exclusive, that

299

iil*rNl

u.th.-"tics

Term

STPM chapter 6 chisquared Tests

is, whatever is being tallied can only be in one cell and cannot overlap. Third, categorical data is a grouping

of data according to similar characteristics in a way to show the frequencies of each category. Let us look at an example to see how we use the chi-squared test to determine whether the frequencies
observed across the categories differ significantly from what are expected theoretically. Consider the tossing

of a six-sided dice. We have the null hlpothesis that the dice is fair, which is equivalent to the hlpothesis
that the distribution of outcomes is uniform. Suppose that the dice is thrown 60 times and each outcome is recorded. The observed frequency o for each face of the dice is shown in the table below:
Faces
1

2
12

3
o_,

5
7

ot =
The chi-squared test

o,=8

= I'l

or=

o-=9

oa=10

will

e-. The table above lists the observed frequencies, and the expected frequencies need

compare the observed frequencies o. with the corresponding expected frequencies to be determined.

To calculate the expected frequency for each outcome, we make use of the hypothesis that the outcome of a fair dice is uniformly distributed. Since the probability of each outcome is one-sixth and there are a total of 60 rolls of the dice, we have
Expected frequency
e

_1 x60=10
6

Note that the expected frequencies are anticipated only in theoretical sense. It is not practical to expect the observed frequencies perfectly match the expected frequencies. The table below lists the observed and
expected frequencies for each category:
Faces

I
or

2
12

4
14

o,=8
e:=10

ot=
e.,

oi7
er= l0

o---9 e-=10

oe=10 ee= l0

er=10

= l0

Now, we need to decide whether the observed frequencies are reasonably close to the expected frequencies or really different from them. The hypothesis to be tested is how good the observed frequencies fit a given pattern or a theoretical distribution. The test is called a goodness-of-fit test.

A useful measure for the oerall


squared test statistic

discrepancy between the observed and expected frequencies is the chi-

v2

-1i=l

5br -,t' I'

where X2 is a value of a random variable X2 whose sampling distribution is approximately very closely described by the chi-squared distribution with k - 1 degrees of freedom and k is the number of categories. The symbols o. and e. represent the observed and expected frequencies respectively for the lth category.
For the chi-squared goodness-of-fit test, the number ofdegrees offreedom shows the number ofindependent free choices which can be made in allocating values to the expected frequencies. In this example of tossing

300

Mathematics Term

STPM chapter 6 Chi-squaredf""ts

a dice, there are six expected frequencies (one for each face, that is, I to 6) and only five of the expected frequencies can vary independently and the sixth one must take whatever value is required to fulfil that constraint oftotal frequency. Thus, the degrees offreedom v = number ofcategories - number ofconstraints. Here there are six categories and one constraint, so v = 6 - I = 5.

To calculate the chi-squared test statistic, we first subtract the expected frequency e. from the observed
frequency o-. Then we square the difference and subsequently divide the squared difference by the expected frequency e., before finally adding the quotients. This is done in the table below:
(o,
e,)2

Faces

o. I

e. I 10

(o,

(o. "r)
2
1

4 4

e.
I

e,)2

t2
8

0.4 0.4
1.6

l0 l0

t4
7 9

4 _J

t6
9

4
5

l0
l0

0.9
0.1 0 X2

-1
0

I
0

l0

l0

3.4

This means the value of X2 with 5 degrees of freedom is 3.4. In the goodness-of-fit test, if the observed frequencies are the same as the expected frequencies, then X2 = 0. Thus, if X2 value is small, there will be high degree of compatibility between expected and observed frequencies, indicating a good fit. lf X2 value is large, there is a low degree of matching between the two frequencies and the fit is poor. This also implies that the critical region falls in the right tail of the chisquared distribution. At the l0% significance level, we flnd X'z, = 9.236 using X2 table. The calculated value of X2 = 3.4 is less than 9.236, it would support the hypothesis that the outcomes of the dice is uniformly distributed. In other words, the dice is fair.

9.236

il
,g30r

Note: To perform a chi-squared test, the expected frequency for each category is at least equal to 5. This restriction may require combining adjacent categories, resulting in a reduction of the number of degrees of
freedom.

lSl *.ah"-.tlcs Term 3

STPM Chapter 6 Chi-squared Tests

EXample 3

A quality supervisor at a glass manufacturing factory inspects a random sample of 60 sheets of glass to check for any minor defects. The number of flaws in a
glass sheet are recorded. The results are as follows:

Numberofflaws 0 Observed frequency 32


distribution.

1 15

2 9

Use a 5% significance level to test the hypothesis that these data follows a Poisson

A test procedure is as follows.

i:*":#illI:i#liHr"'#ilLi',',',::'r',T.0,,,.,
Step

@: Specify the significance level

Here a = 0.05
Step @: Select the appropriate test statistic and calculate its value Use the chi-squared goodness-of-fit test to determine whether observed sample

frequencies differ significantly from expected frequencies specified


hypothesis.

in the null

The mean of the presumed Poisson distribution is unknown so must be estimated from the data by the sample mean,

Lox

^-

L,

- 3z)o+rc*t+9*z+q*3 32+15+9+4
=45 60
=
0.75

Hencewithtr=0.75,
p(X = x) i' '

e-o'5.0.'75*'

x.!

xi= o, 1,2,3

which gives the following probability associated with each class and thus the corresponding expected frequency is obtained by multiplying the appropriate Poisson probability by the sample size n = 60.

B
6

x, 0 t 2 3 or more
If

P(X=x,) 0.472 0.354 0.133 0.041

e,

28.32

2t.24
7.98 2.46

an expected frequency is less than 5, two or more classes can be combined. In the above situation the expected frequency in the last class is less than 3, so we should combine the last two classes to get,

302

Mathematlcs Term

STPM Chapter 6 Chi-squared f"rrc

Number

of flaws 0 1 2 or more

0bserved frequency 32 15 13

Expected

frequency
28.32 21.24 10.44

The chi-squared value can now be calculated:

w2-s @-e)' l\ -L
e

(32

- 28sD'z (ls - 2t.2q'z 28.32 2t.24

(13

rl.4q'z

10.44

= 2.94
Step @: Determine the critical region Since both the total frequency and the mean of the Poisson distribution of the observed data are required in estimation, the number of degrees of freedom is k - 2.Here, we have 3 classes, thus the chi-squared statistic has 3 - 2 = | degree of freedom. Using a significance level of 0.05, from chi-squared distribution table, the critical value of X'?o.r, with 1 degree of freedom is 3.841.
Step @: Make a decision As X2 = 2S4 < 3.841, we conclude that there is no real evidence to suggest the data does not follow a Poisson distribution.

Exampre

fr"i11*"3:'rJi"Ji #u::;r,#1T""'Hl'i-'1fi3;:"Jl",H5il;

deviation s = 6.4 minutes. Determine wether there is significant evidence at the 5o/o significance level, to reject the null hypothesis that the call length has a normal distribution.

Call length (in

minutes)

Frequency
4
9 16
13
5

0-s
5-10 10-15 15-20 20-25

2s-30

We proceed with the steps of a test procedure as follows:


Step @: State the hypotheses Ho: The telephone call lengths follow a normal distribution H,: The telephone call lengths do not follow a normal distribution

il l303

N U"th.-"tlcs

Term

STPM Step

Chapter 6 Chi-squared Tests

@: Specify the significance level

Here a = 0.05
Step @: Select the appropriate test statistic and calculate its value Use the chi-squared goodness-of-fit test to determine whether observed sample

frequencies differ significantly from expected frequencies specified


hypothesis.

in the null

The distribution of call lengths may be approximated by the normal distribution.

The sample mean and sample standard deviation

will be used for p and o in

calculating z values corresponding to the class boundaries. The expected frequency for each class (category), listed in the given table can be obtained from a normal curve. The z values corresponding to the boundaries of the second class are

_ 5-t4 = -t.406 r 6.4 to-t+ ,-= =_0.625 , 6.4


From the normal table, the area between zt

P(-1.406<Z<-0.62s)
= P(Z < -0.62s) - P(Z = 0.266 - 0.08 = 0.186

-1.406 and z, = -0.625 is

<

-1.406)

Thus, the expected frequency for the second class is e,

:0.186 x 50:9.3.

The expected frequency for the first class interval is obtained by using the total area under the normal curve to the left of the boundary 5. For the last class interval, we can use the total area to the right of the boundary 25. All other expected frequencies could be found by the similar method described above for the second class. The complete set of calculation needed to find the expected frequency in each class is summarised in the table below. Note that we have combined adjacent classes in the table, where the expected frequencies are less than 5. As a result, the total number of classes is reduced from 6 to 4. Class

l0-ls
'ri

i:i, i rs-20
i

boundaries

o,

;),,
16

1l

e;i
14.8

t3

-'rZ

,,

t3.2

;)t

ilj,"
+L
0.0068

The following table shows the detailed calculations for the chi-squared value. Class

re J
304

boundaries Below 10 10-15 15-20 Above 20

oi
13

,
r3.3
14.8 13.2 8.8

(o,- e,) (o,- e,)2

-0.3
1.2 -0.2 -0.8

0.09
t.44
0.04
0.64
X2

16
13

0.0973
0.0030 0.0727

0.180

Mathematlcs Term

STPM Cf,apter 6 Chi-squared

f""t"

Step @: Determine the critical region Altogether three constraints: total frequency, sample mean and standard deviation, have been estimated from the sample data, the number of degrees of freedom is therefore equal to k - 3 = 4 - 3 = l. Using a significance level of 0.05, the critical value of chi-squared with I degree of freedom is 3.841. Step

As X2 = 0.180 < 3.841, we have no reason to reject the null hypothesis and conclude that the normal distribution offers a good frt for the distribution of
telephone call lengths.

@: Make a decision

Exereise
l.

6.'

Assume that a chi-squared goodness-of-fit test is conducted. Determine the critical value of the chisquared test statistic for each of the following cases. (a) Number of categories = 7, ot = 0.01

(b)

Number of categories = 10, a = 0.10


as follows:

A random sample of 500 observations is obtained and distributed into 4 categories

CategoryL234 49 xi
Use a = 0.05 to test the null hypothesis Ho:

263

146

42

p, = 0.10, pz = 0.50, p, =

0.30, p4

0.10.

Three coins are tossed 150 times, and the observed frequencies of 0, l, 2 and 3 heads per toss are 14, 43, 67 and 26 times respectively. Use a 570 significance level to test whether the three coins are
balanced.

An experiment is to draw a card from a regular deck of 52 cards that has been thoroughly shuffled and it is recorded whether it is a spade, heart, diamond, or club. This process is repeated 40 times, each time replacing the card just drawn. If after 40 trials, 9 spades, 13 hearts, ll diamonds and 7 clubs are obtained. Test the hypothesis that the deck is honest at the 10% significence level.
Each package of beans sold in the supermarket is supposed to mix red beans, mung beans, black beans and black-eyed beans in the ratio of 5:3:l:1. A random sample selected from these packages contains 400 of mixed beans is found to have 210 red beans, 124 mung beans, 30 black beans and 36 blackeyed beans. Test the hlpothesis that the package contains the mixed beans in the ratio 5:3:1:l at the 0.05 significance level.
6.

jelly beans. This bag has 5 different colours of jelly beans in it. Assume all five colours are equally likely to be put in the bag. The boy is curious about the colour distribution

A boy buys a bag of

100

and opens the bag. He finds out that he has 17 brown, 24 yellow, l0 red, 31 green, and l8 white. Test the hlpothesis that the colours of the jelly beans occur with equal frequency at a significance level of
5o/o.

7.

The number of road accidents per week at a junction is monitored by the public traffic department. The table below shows the frequency of accidents per week in 60 weeks.

il
6

Number of
Observed

accidents frequency

28

123 15

12

(a) Determine the mean number of accidents per week. (b) Test the hypothesis that the data follows Poisson distribution

at the 5% significence level.


305

8.

The following frequency distribution table represents the number of days during a year that a total of 50 employees at a company are absent from work due to illness. It is thought that the data follows a

normal distribution with population mean Number of days

Lt

= 7 and, standard deviation o =


Number of employees
4
13

3.

absent

0-3 3-6 6-9 9-t2 t2-15

24
7 2

Test the goodness-of-fit between the observed class frequencies and the corresponding expected frequencies of a normal distribution at the 5% significence level.

9.

A paper shop has several retail stores in a city. The following table shows the number of boxes shipped per day for the last 100 days. Number of packages

shipped

Number of days
5 13

0-5 5-10 10-15 t5-20 20-25 25-30 30-35 (a) (b)


10.

28 23
18

l0
3

Calculate the sample mean and sample standard deviation of the number of absent days per week. Use a 5% significance level to test the goodness of fit between the observed class frequencies snd the corresponding expected frequencies of a normal distribution.

The table below shows the number of rain days in fanuary for the years from 1953 to 2004.

Numberofraindays 0 9 Observed frequency (a) (b)


Find the mean rain
day.

I 7

2 14

3 15

4 6

I
10olo

Test the hypothesis that the recorded data may be fitted by the Poisson distribution at the significance level.

11. A recent study reports

the number of hours of personal computer usage per week for a sample of 60 persons. Excluding from the study are people who work in the office and use the computer as part of their work.

1.1 4.3 6.3 2.4 4.3 (a) (b) (c)


306

6.7 4.5 2.r 2.4 9.7

2.2 9.3 2.7 4.7 7.7

2.6 5.3 0.4 1.7 5.2

9.8 6.3 5.1 2.0 r.7

6.4 8.8 5.6 6.7 8.s

4.9 6.5 5.4 3.7 4.2

5.2 0.6 4.8 3.3 5.5

4.5 5.2 2.1 1.1 9.2

9.3 6.6 10.1 2.7 8.s

7.9 9.3 1.3 6.7 6.0

4.6 4.3 5.6 6.s


8.1

Organise the data into a frequency distribution. Compute the sample mean and sample standard deviation of number of hours computer usage per week. It is thought that the data follows a normal distribution. Test the hlpothesis at the 57o significance
Ievel.

Mathematics Term

STPM Chapfer 6 Chi-squared fe"ts

When two attributes (variables) are observed for each element of a random sample, the data can

be

simultaneously classified with respect to these attributes in a two-way classification table called a contingency table. We can then determine whether there is a significant association between the two attributes.
Suppose we take a random sample of 200 persons and classify them based on gender as well as whether these persons own handphones. The observed frequencies are presented in the following 2 x 2 contingency table.

Own handphone
(ves)

Own

handphone
(no)

Total
130

Male
Female

70 30
100

Total

60 40 100

70

200

A contingency table can be of any


as an

size. In general, a contingency table with r rows and c columns is denoted r x c table. The row and column totals in the above table are called marginal frequencies. It is common practice to refer to each possible outcome of an experiment as a cell. Hence in our example we have four cells.

Let us test the hlpothesis of independence between a person's gender and a person's possession of a handphone. To perform this test, we first calculate the expected frequencies for each of the four cells of the above 2 x 2 contingency table under the assumption that the hypothesis is true.

M represent the event that an individual selected from the sample is male. Let Y represent the event that an individual selected owns a handphone.
Let
Since

M and Y are independent

P()'')

200

loo . Thu.. we have

events, P(M
e

n D = P(M)P(I).
,.,

But P(M

n n =#,P(M)

=ffi,

a.,d

2oo

no\/ roo \ -I 2oo \ /\ 2oo /


total)

Which we can rearrange

as

, ',,-

130

2oo--@"

x 100 _ (First row total)(First column

Where e,, is the expected frequency for the cell in row

and column

l.

The general formula for obtaining the expected frequency of any cell is given by

Expected frequency

(Row-total)(Colpmn total)
Total sample size

The expected frequency for each cell is recorded in parentheses beside the actual observed value in the table shown below.

Own handphone
(yes)

Own handhpone

Male
Female

70 (6s) 30 (3s)
100

Total

i (no) 60 (6s) I 40 (3s) 100 I


I
,

Total
130

70

200

il
6,-307

Note that the expected frequencies in any row or column add up to the appropriate marginal total. We need to calculate only the one expected frequency in the top row of the table and then find the others by subtraction. The number ofdegrees offreedom associated with the chi-squared test used here is equal to the number of cell frequencies that may be filled in freely when we are given the marginal totals and the grand

o DrrM

onapteroL;nFsqu

total, and in this illustration that number is 1. A simple formula providing the correct number of degrees of freedom is

v=(r_l)(c_l).
Hence, for our example, v = (2

- l)(2 -

1)

=I

degree of freedom.

We want to measure how much the observed frequencies differ collectively, from their corresponding expected

frequencies. We do this with the chi-squared test statistic

-,n-,?,{
We have uz

(o -e,)2

where the summation extends over all the cells in the

r x c contingency
(40

table.

(70

- 65)'z 65

(60

- 65)r 65

(30

- 35): 35

3s

35),

= 2.1978
Using a chi-squared table, we can see that for y = 1, the critical value for 5% significance level is X] = 3.3a1. Since the calculated value for X2 of 2.1978 does not fall within the critical region, we do not ieject the hypothesis that there is no relationship between a person's gender and the person's possession ofa handphone.

EXample 5

The following data show the attitude of housewives in various parts of the country

to a certain brand of detergent.

Attitude Like Indifferent Dislike

North 46 25 16

Central 21 58 37

South

3l
35

42

Test the hlpothesis that the attitude to new introduced detergent is independent of geographical area of residence at the l7o significance level

The given table is arranged to include the row and column totals.

Attitude Like Indifferent Dislike Total

North 46 25 16 87

Central 2t 58 37 116

South 31 35 42 108

Total
98 I

l8
95

311

G --g

Step @: State the hypotheses Ho: There is no association between attitude and location H,: Theere is association between attitude and location Step @: Specifr the significance level Given a = 0.01 Step @: Select the appropriate test statistic and calculate its value Use the chi-squared test for independence to determine whether there is any significant association between the two categorical variables.

Mathematlcs Term

STPM Chapter 6 Chi-squaredf"y"

As with goodness-of-fit test described earlier, the key idea of the chi-squared test for independence is a comparison of observed and expected frequencies. The expected frequency for each cell of the table can be generated using the following formula: (Row-total)(Colgmn total) Expected frequency - ---1---"-t
Total sample size

In fact, for a 3 x 3 contingency table, only four expected values in the top two
rows of the table are calculated and the remaining five expected values are found

by subtraction. For example, to calculate the expected frequency (for attitude


like and

JL north;29-I ' 311 = 27.41.In this way,

the table of both observed and

expected frequencies is as shown below.

Attitude
Indifferent Dislike

North
2s

Central
s8

South

Total
98

(33.01) 16 (26.s8)

(44.01) 3s (40.e8) 37 (3s.44) 42 (32.e8)

ll8
95
311

Total

87

116
= (r

108

The number of degrees of freedom v The chi-squared test statistic is


L-2

lXc

- l) = (3 -

1X3

- l)

4.

"

.( (o,-e,)'
i=l
Ei

A6

- 27.4i'), Ql - 36.55)2, (31 - 34.04)2, (25 - 33.01)'z, (58 - 44.01)2 44.01 33.01 34.04 36.55 27.41 .(35-40.98)2. (16-26.5$2 . G7-35.4q'z , e2-32.98)'? 32.98 26,58 35.44 40.98

=
Step

33.5057

@: Determine the critical region From chi-squared table, the critical value X2 for 4 degrees of freedom at is given by 13.28.
Step

17o

level

As the calculated value 33.51 is greater than the critical value 13.28, we can conclude there is evidence to reject Ho; that is attitude to new detergent and geographical area of residence are not independent.

@: Make a decision

E}(ereise&
1.
An experiment has 500 observations and the data are classified into 4 x 6 contingency table. Suppose we conduct a chi-squared test of independence at the l7o significance level. Assume the calculated value of the chi-squared test statistic is 39.2. (a) Determine the number of degrees of freedom. (b) Find the critical value for the chi-squared test of independence. (c) Determine whether the chi-squared test values falls into the critical region.

il
6

309

lSl *.ahu-.tlcs Term 3

STPM

Chapter 6Chr'-sguared rests

2,

The following3 x 2 contingency table contains observed values for a sample of size 250. Determine whether the row and column variables are independent using the chi-squared test with a = 0.025.

X
A
B 25
55

,Y
)/
32 38

63

3.

A research group performs a study on gender and handedness (right- or left-handed). 800 individuals are randomly chosen from a very large population. The following contingency table displays the
distribution of the two categories.

Right-handed 344 Male 352 Female 4.

Left-handed
72 32

Test the hypothesis that gender is independent of handedness at the 57o significance level.

Consider a sample of 200 customers. For each customer, we have information on gender and preference of food. A contingency table for these data is shown below.

Indian
Male
40

fapanese
20

Western
50 20

Female
gender and preference of food.

20

50

Carry out a test, at the 57o significance level, to determine whether there is any association between

5.

In an experiment to study the association between diabetes and smoking habits, the following data

are

No

Diabetes diabetes

Nonsmokers 25 40

Moderate smokers 30 2L

Heavy smokers
18 16

Using a l%o significance level, test the hypothesis that there is no association between cigarette smoking and the risk of diabetes.

6.

A camera manufacturer has four suppliers of lenses. The table below shows the numbers of defective lenses supplied by the suppliers.

B
-g

Supplier Supplier Supplier Supplier

I 2 3 4

Good 95 180 134 138

Defective
5

15

t6
7

Test, at the 57o significance level, whether the supplier is associated with the lens quality. What is your advice to the purchasing department based on the test result?

Mathematics Term

STPM Chapfer 6 Chisquaredf"src

7.

The table shows the result of a taste test in which a random sample of 500 people in two age groups is asked which of four formulations of a chocolate drink they prefer.

Age group 7 -25 26-50

Formulation A
30 28

Formulation B
69 36

Formulation C
116 70

Formulation D
78
73

Use a 0.01 significance level to test whether the preference for the different formulation change with
age.

8.

Fruit trees are subject to a bacteria-caused disease. Several different treatments for this disease are adopted. Treatment A: no action taken, treatment B: careful removal of clearly affected branches, and treatment C: frequent spraying of the leaves with an antibiotic in addition to careful removal of clearly a{fected branches. There are few different outcomes from the disease. Outcome 1: tree dies in the same year as the disease is noticed, outcome 2: tree dies 2-4 years after disease is noticed, outcome 3: tree survives beyond 4 years. A group of 200 trees are assorted into one of the treatments and over the next few years the outcome is recorded. The results are displayed in the following contingency table.
Treatment
Outcome
1

A
37

B
24 20
15

t7
32

2 J

l6
J

36

Determine whether there is any substantial evidence treatment. Use a 5% significance level for this test.

to conclude that outcome is independent of


of

9.

The table below shows the observed distribution of blood types: A, B, AB, and O in three samples Malays living in Kedah, Selangor and fohor.

Blood type

Kedah

Selangor
205
184

|ohor 4t
37

A
B

t4
16
3

AB

5l
232

1l

o
states.

t7

5l

Test, at the 5o/o significance level, whether the distribution of blood type is different across the three

10. A manufacturer

operates four assembly machines on three separate shifts daily. The table below gives the number of machine breakdowns recorded in the past year.

Machine First shift


Second shift
75

Machine 2
89
108

Machine 3
43 63

Machine 4
28
59

90
141

Third shift

175

t2t
2.5o/o

t4l

il l3ll

Determine whether these data provide sufficient evidence, at the machine breakdown is independent of shift.

significance level, to infer that

ummePg
l. The chi-squared distribution has one parameter, called the degree of freedom. 2. The chi-squared distribution curve lies to the right of the vertical axis and is skewed to the right. 3. In a goodness-of-fit test, we test the null hypothesis that the observed frequencies follow a certair
. :"":"::i::::.]:i":']:'::t:: ,hhrmnrhpcic

rh,r

rrrrn arrrihrrrpc

,rp inr,pnpnrpnr

5.

General test procedure ln a chl-squared test. . State the hypotheses . Specify the significance level

. . .

Calculate the value of the chi-squared test statistic f -e')' (Combine any adjacent classes , i= I where necessary) Determine the critical region based on the number of degrees of freedom and the significance level Make a decision

@'

REVI'ION EXERCI'E
l. (a)
Find P(0.83

< x1 <

12.8)

(b)
)

Determine the value of ft such that P(6.447

X'r,

<

k) =

O.Oag.

Three identical dice are thrown 150 times. The number of dice whose scores on the top faces at each throw are odd is recorded. The results are as follows:

Number of odd scores


Frequency
JJ

59

43

l5

Using a 570 significance level, test the hypothesis that all three dice are unbiased.

A departmental store sells men's shirts and stocks these shirts in five different sizes:
XXL. The number of the shirts sold each week is recorded.
Sizes
S

S,

M, L, XL, and

Number of shirts

2l
24 39
25
13

M
L

XL

xxL
4.

Test, at a l07o significance level, the hypothesis that number of shirts sold is uniformly distributed. Cars heading to a certain junction may go straight, turn left or turn right. A road transport department officer asserts that 60% of the cars will go straight at the intersection, and of the remaining 40%o, equal proportions will turn left and right. One hundred cars are randomly monitored and it is found that 51 cars go straight, 17 cars turn left are 32 cars turn right. Test, at the 5olo significance level, the hypothesis

that the proportions of cars going straight, turning left and turning right do not differ significantly from those asserted by the officer.

Mathematics Term
5.

STPM Chapter 6 Chi-squaredrr"ts

A pharmaceutical company conducts a trial on 200 patients to determine the effectiveness of a new cough remedy. Of these patients, 100 are randomly selected to be given the standard cough remedy
and the remaining 100 are assigned the new cough remedy. The result are recorded as shown. Standard cough remedy No relief
Some relief
53

New cough remedy


37

34
13

44
19

Full relief

Carry out a test, at a significance level of 57o, to investigate whether the two cough remedies are equally
effective.
6.

A football fan keeps the record of the goals


shown below. Goals obtained per match

scored per match by his favourite team. The results are

34 11 16 25
14

Number of matches

(a) (b)

Computed the mean number of goals scored per match. Using a 57o significance level, perform a test of the hlpothesis that the number of goals per match has a Poisson distribution.

7.

The following table gives the cumulative frequency distribution of the lives (in years) of 40 note-book

batteries tested by a battery manufacturer.

Battery life not greater than Cumulative frequency


Based on the previous experience,

1.5

2.0
2

2.5
J

3.0

3.5 22

4.0
32

4.5

5.0

3t

40

it

is believed that a normal distribution with mean 3.5 years and


5o/o

standard deviation 0.7 year provides a good approximation. Perform a chi-squared test, at the significance level, to determine whether the normal distribution gives a good fit for these data.

The table below shows the frequency distribution of marks for a paper obtained by 178 candidates.

Mark,.r

Number of candidates
5 19

50<x<60 40<x<50 30<x<40 20<x<30 10<x<20

34 63 47

0<x<10

l0

il g*
313

The population mean and standard deviation of the distribution of marks for the paper are 26.0 and 11.5 respectively. Test, at the 10% significance level, the hypothesis that the distribution of marks for

the paper is normal.

lNl U"th"-"tics Term 3 STPM chapter 6 Chi-squared

Tests

9.

in each of 80 pots. The number of seeds which germinate in each port recorded. The results of all the 80 ports are given in the following table.
A botanist sows three
seeds

is

Number of seeds germinate Number of pots

0 25

20

29

(a) (b)

Estimate the probability that an individual seed germinates. Using a 17o significance level, test the hlpothesis that the data may be fitted by the binomial

distribution.

10.

The distributions of marks for a paper marks in an examination has mean U and standard deviation o. Each candidate is assigned one of the five grades A, B, C, D, E as follows:

Mark,x

Grade
A

x 2 ,2 ui39 u+g< x < u+3! '22


u-g<xlui! '22

B
C

u-3L<x<rr-4 '22 x < u-3L '2

D E

The table below summarises the grades of a random sample of 198 candidates.

Grade Number of candidates

B
55

C
81

D
JJ

t7

t2

Determine, at the 1% significance level, the adequacy of a normal distribution as a model for these
data.

11.

The lengths
as follows:

(in millimetres) in a random sample of 50

leaves

of a certain plant are recorded


132
150 168 138 150

B J
314

145 155 138 163 156

133 136 177 r35 147

125 144 165 147 142

157 158 l l8 153 128

165 147 154 146 144

138 t43 151 148 t52 140 148 146 126 163 121 140 140 173 142 r35 145 l5l 135 161

Test the hypothesis that the leave length can be approximately modelled by a normal distribution.
Use a 0.05 significance level.

Mathematlcs Term

STPM Chapler 6 chi-squaredf""t"

12.

The table below shows the number of individuals exposed to a certain virus and the number of individuals who develop the disease. Development of disease

Exposure

virus

to

Yes No

Yes 44 19

No
116

128

Conduct a test of hypothesis at the l% significance level, to determine whether there is association
between the exposure to the virus and the development of the disease.

13.

The table below shows the number of males and females in each of three ernployment categories at a manufacturing company.

Managerial Support Male Female


categories.
10

Worker
285 624

39
52

Using a 17o significance level, test whether there is any association between gender and employment

14. A researcher in a study of heart

disease

in males links subjects to socioeconomic status and smoking

habits. The results are summarised in the contingency table below Socioeconomic status

High

Middle
29 27

Low
55 36 30
a

)T9Ktng hablts
significance level
2.5o/o.

Current Former Never

66 19 gg

lz

Perform a chi-squared test on association between smoking habits and socioeconomic status. Use

15. A hlpermarket wants to study the relationship between the method of payment by customers of different age groups. A random sample of 250 customers is taken and the results are summarised
in the table below.
Age group

L8-25 Payment method


Carry out a test at the

26-35 36 27

36-45 25 33

Over46
30
67

Card Cash
570 significance level

l8 t4

il
6
315

to find out whether the method of payment is independent

of

age group.

N *"an"rr,.tics
16.

Term

STPM Chapter 6 Chi-squared Tests

The school of Biological Sciences of a university records the level of exposure to a certain pollutant and the number of brain abnormality for laboratory mice. The data are summarised in the table below.
Number of brain abnormalitiy

0-2
Level of
exposure to

3-4
18
7 8

5-6
39
13 8

High Medium
Iow

t2
8

pollutant

Test, at the 570 significance level, r.thether there is association between the level

of exposure to the

pollutant and the number of brain abnormality lbund in the laboratory mice.

17,

The table below summarises the number of hours of sleep at nights for a random sample of adults of different age groups.

Number of hours of sleep


Less than

6 to

More than 8
70 62 43

Age group

25-44 45-54
>_ 55

41 34 76

85 77 69

Carry out a test, at the 1% significance level, to determine whether the number of hours of sleep is independent of the age of an adult.
18.

A plant expert collects samples of rice from a large field of 600 plots. One part of his investigation based on the sterility observed and genotype used for each plot.
Genotypes

is

I
No problem
Sterilitv Moderate
Severe

II

III

IV
16

30 r02 18

21 90 39

19 120 11

77 57

Test, at a

l% significance level, whether sterility is independent of genotype.

3t6