Vous êtes sur la page 1sur 15

UECM3253: Applied Nonparametric Statistics UTAR: May 2013

Prepared by: Chang Yun Fah 1


CHAPTER 5
PROCEDURES THAT UTILIZE DATA FROM
THREE OR MORE INDEPENDENT SAMPLES


In this chapter, we shall be interested in testing the null hypothesis that the several samples
have been drawn from the same population or from populations with equal location parameters. The
parametric counterpart to most of these procedures is the one-way analysis of variance F test,
assuming that the samples are randomly and independently drawn from normally distributed
populations with equal variances. An advantage of the procedures discussed in this chapter is that
their validity does not depend on such restrictive assumptions.

5.1) Extension of The Median Test

Assumptions:
A. Each sample is a random sample of size
i
n drawn from one of c populations of interest with
unknown medians
1 2
, , ,
c
M M M K .
B. The observations are independent both within and among samples.
C. The measurement scale employed is at least ordinal.
D. If all populations have the same median, then for each population the probability p is the
same that an observed value exceeds the grand median.

Hypotheses:
0 1 2
:
c
H M M M = = = L (or the populations are homogeneous with the respect to the proportion
of observations falling above and below the common population median)
1
: H At least one population has a median different from at least one of the others.

Test Statistic:
Combine the c samples, order them and compute the combined sample median.
Classify each observation according to the sample (or population) to which it belongs and
according to whether it is larger than, equal to, or less than the median.
Display the results in a two-way contingency table

Sample
1 2 3 L c Total
>Sample median 11
O
12
O
13
O
L 1c
O
1.
n
Sample median 21
O
22
O
23
O
L 2c
O
2.
n
Total .1
n
.2
n
.3
n
L .c
n
N
where
ij
O is the observed frequency of observations falling in the ith group of the jth
sample,
1.
n is the total number of observations larger than the combined sample median, and
2.
n is the total number of observations that the equal to or smaller than the combined sample
median.
Calculate the expected cell frequencies using
. . i j
ij
n n
E
N
= .
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 2
The test statistic is
( )
2
2
2
1 1
c
ij ij
i j
ij
O E
X
E
= =
(

(
=
(

.
Decision Rule:
Reject
0
H if
2 2
1, c
X

> .

Example 5.1:
In a study designed to determine the distribution of myocardial water and the cellular
concentrations of cardiac electrolytes, a doctor used the tracer method to measure the extracellular
space in the ventricular muscle of two groups of nephrectomized rats and one group of intact rats.
Table below shows the results. We wish to know whether we may conclude from these data that the
population medians are different. (R1. pg. 222)
Nephrectomized rats Intact rats
Group 1 Group 2 Group 3
0.185 0.189 0.219
0.187 0.193 0.204
0.209 0.176 0.219
0.194 0.195 0.234
0.175 0.169 0.233
0.197 0.183 0.194
0.188 0.185 0.209
0.185 0.179 0.195

5.2) Kruskal-Wallis One Way Analysis of Variance By Ranks
This is the most widely used nonparametric technique for testing the null hypothesis that
several samples have been drawn from the same or identical populations. When only two samples
are being considered, the Kruskal-Wallis test is equivalent to the Mann-Whitney test discussed in
Chapter 3. The Kruskal-Wallis test is more powerful compared to median test as there are more
information used.

Assumptions:
A. The data for analysis consist of k random samples of sizes
1 2
, , ,
k
n n n K .
B. The observations are independent both within and among samples.
C. The variable of interest is continuous.
D. The measurement scale is at least ordinal.
E. The populations are identical except for a possible difference in location for at least one
population.

Hypotheses:

0
: H The k population distribution functions are identical. ( )
1 2 k
M M M = = = L

1
: H The k populations do not all have the same median.

Test Statistic:
Compute the Table below
Sample
1 2 3 L k
1,1
X
2,1
X
3,1
X
L
,1 k
X
1,2
X
2,2
X
3,1
X
L
,2 k
X
The actual
value, then
replace by its
rank.
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 3
M L M
1
1,n
X
2
2,n
X
3
3,n
X
,
k
k n
X
1
R
2
R
3
R
k
R
( )
1
1
2
n N +

( )
2
1
2
n N +

( )
3
1
2
n N +


( ) 1
2
k
n N +

The test statistic is
( )
( )
2
2
1,
1
1
12 1
~
1 2
k
i
i k
i
i
n N
H R
N N n

=
( +
=
(
+


or a more computationally more convenient form
( )
( )
2
2
1,
1
12
3 1 ~
1
k
i
k
i
i
R
H N
N N n

=
= +
+


where
i
R is the sum of the ranks assigned to observations in the ith sample,
( ) 1
2
i
n N +
is the
expected sum of ranks for the ith treatment under
0
H , and
1
k
i
i
N n
=
=

.

Decision Rule:
1. If there are only 3 k = samples, and each sample has 5
i
n observations, the critical value
is obtained from the Kruskal-Wallis table (Table A12).
Reject
0
H if H > the critical value from Table A12 for the preselected value of or the
nearest.
2. For cases more than 3 samples or 5
i
n > , we use Chi-Squared test with 1 k degrees of
freedom (Table A11).
Reject
0
H if
2
1, k
H

> .

Example 5.2:
A study reported the data below on cortisol levels in three groups of patients who were
delivered between 38 and 42 weeks gestation. Group I was studied before the onset of labor at
elective Caesarean section, group II was studied at emergency Caesarean section during induced
labor, and group III consisted of patients in whom spontaneous labor occurred and who were
delivered either vaginally or by Caesarean section. We wish to know whether these data provide
sufficient evidence to indicate a difference in median cortisol levels among the three populations
represented.
Group I 262 307 211 323 454 339 304 154 287 356
Group II 465 501 455 355 468 362
Group III 343 772 207 1048 838 687

1) Correction for Ties
If there are a substantial number of ties, we may want to adjust the test statistic. The
adjusted test statistic becomes
( )
3
1
C
H
H
T N N
=



Note: The effect of the adjustment is to inflate the value of the test statistic. Thus if H is significant
at the desired level of significance without the adjustment, there is no point in computing
C
H .

Sum of the
ranks
Expected sum
of ranks if H
0

is true.
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 4
5.3) Jonckheere-Terpstra Test for Ordered Alternatives
In some applications of parametric statistical procedures, it is appropriate to test the null
hypothesis of equality among population means against an alternative in which order is specified,
say
1 1 2
:
k
H L . This alternative is sometimes more meaningful than the common one
1
: H Not all means are equal. A sociologist, for example, may be interested in knowing whether
people in low, middle, and high socioeconomic groups possess low, middle and high knowledge of
certain current issues. Alternative hypotheses of this type are referred to as ordered alternatives.

Assumptions:
A. The data for analysis consist of k random samples of sizes
1 2
, , ,
k
n n n K from populations
1,2,,k, with unknown medians
1 2
, , ,
k
M M M L , respectively.
B. The observations are independent, both within and among samples.
C. The variable of interest is continuous.
D. The measurement scale is at least ordinal.
E. The sampled populations are identical except for a possible difference in location
parameters.

Hypotheses:
0 1 2
:
k
H M M M = = = L
1 1 2
:
k
H M M M L , with at least one strict inequality.
Note: If the expected direction of inequality is not as specified in this alternative hypothesis, relabel
and reorder the samples to achieve conformity.

Test Statistic:
The test statistic is
ij
i j
J U
<
=


where
ij
U is the number of pairs of observations ( ) , a b for which
ia jb
X X < . In other words,
we compare each observation in the first sample in the pair of samples with each
observation in the second sample in the pair, and if the observation form the first sample is
less than the observation in the second sample, we record a score of 1. We record a score of
0 if the observation from the first sample is greater than the observation form the second
sample.

Decision Rule:
Reject
0
H if
1 2
, , , , ,
k
k n n n
J J
K
(refer to Table A13 for 3 k = and
1 2 3
n n n ).

Note: 1) J has certain symmetry properties.
2) We may obtain critical values for configurations not in order
1 2 3
n n n by rearranging
the three sample sizes so that they are in order of increasing size before we enter the table.

Example 5.3:
A researcher, Nappi investigated the changes occurring in the haemocytes of larvae of
Drosophila algonquin during parasitization by the hymenopterous parasite Pseudeucoila bochei.
Twenty-seven hours after parasitization of Drosophila algonquin larvae, differential counts (%) of
plasmatocytes were made on three groups: host larvae in which reaction was successful (S), those in
which the reaction was unsuccessful (U), and those in which there was no visible host reaction (N).
The results are shown in table below. We wish to test the null hypothesis of no difference among
the three groups against the alternative that the differential counts of plasmatocytes (%) decrease in
the three groups from group N to group S. (R1. pg. 236).
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 5
S U N
54.0 79.8 98.6
67.0 82.0 99.5
47.2 88.8 95.8
71.1 79.6 93.3
62.7 85.7 98.9
44.8 81.7 91.1
67.4 88.5 94.5
80.2

1) Large Sample Approximation
( )
( ) ( )
( )
2 2
1
2 2
1
4
~ 0,1
2 3 2 3 72
k
j
j
k
j j
j
J N n
z N
N N n n
=
=
(

(

=
(
+ +



Example 5.4:
Consider Example 5.3 and apply large sample approximation.


5.4) Multiple Comparisons
When a hypothesis testing procedure such as the Kruskal-Wallis test leads us to reject the
null hypothesis and thus to conclude that not all sampled populations are identical, we naturally
question which populations are different from which others. It is probably of greater interest and
importance to be able to say more about the differences. For example, we would like to know
whether the medians
1
M ,
2
M and
3
M are all different each other or if the difference is between
1
M
and
2
M only, between
1
M and
3
M only, or between
2
M and
3
M only.
The logical approach to answering this question might appear to be to use some procedure
such as the Mann-Whitney test, to test for a significant different between each of all the possible
pairs of samples. There is, however, a problem inherent in following such a course: Testing all
possible pairs of means in the usual way affects the probability of rejecting a true null hypothesis. If
we carry out C independent comparisons between pairs of samples, each at a stated significance
level of , the probability of declaring at least one difference significant as a result of chance is
equal to ( ) 1 1
C
, which is approximately equal to C for small values of . Consequently, in
the typical situation, the probability of finding at least one counterfeit significant outcome increases
as the number of independent comparisons increases. Finding the corresponding probability in the
case of nonindependent comparisons is more complicated.
One way to circumvent this problem is to use a multiple-comparison procedure that
incorporates an adjustment for the problem regarding the level of significance. Several such
procedures are available. The one considered here is suggested by Dunn (1964); it is appropriate for
use following a Kruskal-Wallis test. When we apply this multiple-comparison procedure, we use
experimentwise error rate, which represents a conservative approach in making multiple
comparisons, holds the probability of making only correct decisions at 1 when the null
hypothesis of no difference among populations is true. This approach protects well against error
when
0
H is true, but it makes no difficult the task of detecting differences that are significant when
the null hypothesis is false.

Algorithm:
Obtain the mean of the ranks for each sample. (let
i
R be the mean of the ranks of the ith
sample and
j
R be the mean of the ranks of the jth sample).
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 6
Select an experimentwise error rate of , which we think of as an overall level of
significance. It is determined in part by k, the number of samples involved, and is larger for
larger k. There will be a total of ( ) 1 2 k k pairs of samples that can be compared a pair at a
time. We usually select a value of larger than those customarily encountered in single-
comparison inference procedure, e.g: 0.15, 0.20 or 0.25, depending on the size of k.
Form the inequality (for unequal sample sizes)
( )
( )
1
1
1
1 1
12
i j
i j k k
N N
R R z
n n

| |
+
+
|
|
\

where N is the number of observations in all samples combined,
( )
( )
1
1
~ 0,1
k k
z N

is the
critical value obtained from Table A2.
For the k samples are all of the same size, inequality reduces to
( )
( )
1
1
1
6
i j
k k
k N
R R z

+

Any difference
i j
R R > the right hand side of inequality (either equal sample size or
different sample sizes) is declared significant at the level level.

Example 5.5:
We refer again to the data of Example 5.2, which we analyzed by using the Kruskal-Wallis
test. A computed value of the test statistic of 9.232 H = allowed us to reject at the 0.01 level of
significance, the null hypothesis that three populations were identical. As a result, we concluded
that median cortisol levels are not the same for all three types of patients studied. To make all
possible comparisons in order to locate just where the differences occur, let us choose an error rate
of 0.15 = .

1) Ties
If there are extensive ties in the data, we can adjust Inequalities to ensure a conservative
result. When we adjust for ties, the appropriate inequality for unequal sample sizes is
( )
( ) ( )
( )
2 3
1
1
1 1
1
12 1
i j
i j
k k
N N t t
n n
R R z
N

| |
(
+
|
|
\



The appropriate inequality for equal sample sizes is
( )
( ) ( )
( )
2 2
1
1
1
6 1
i j
k k
k N N t t
R R z
N N



where t is the number of values in the combined sample that are tied at a given rank (as illustrated in
Example 2.6). The adjustment for ties usually has a negligible effect on the results.

2) Comparing all treatments with a control
Sometimes the research situation is such that one of the k treatments is a control condition.
When this is the case, the investigator is frequently interested in comparing each treatment with the
control condition without regard to whether the overall test for a treatment effect is significant, and
irrespective of any potential significant differences between other pairs of treatments. When interest
focuses on comparing all treatments with a control condition, there will be 1 k comparisons to be
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 7
made. The procedure is the same as described for the case in which all possible pairs of treatments
are compared except for the method of obtaining
( )
1
1 k k
z

replaced by
( )
1
2 1 k
z

.

Example 5.6:
A fertilizer manufacturer conducted an experiment to compare the effects of four types of
fertilizer on the yield of a certain grain. Homogeneous equal size experimental plots of soil were
made available for the experiment. They were randomly assigned to receive one of the five
fertilizers, and plots receiving no fertilizer served as controls. At harvest time nine plots were
randomly selected from those previously assigned to each of the fertilizers and the control plots.
The yields (in coded form) for each plot are shown in Table below. (R1. pg. 243).
Fertilizer
1 2 3 4 5
None (0) A B C D
10.5 16.0 28.5 33.0 45.0
1.0 15.0 23.0 37.0 42.5
2.5 17.0 23.0 23.0 38.0
5.0 10.5 26.0 35.5 40.0
6.0 12.5 30.0 31.5 42.5
2.5 7.0 21.0 25.0 34.0
8.5 12.5 20.0 31.5 42.5
8.5 19.0 28.5 42.5 39.0
4.0 14.0 18.0 27.0 35.5
Total (R) 48.5 123.5 218.0 286.0 359.0
Mean ( R ) 5.39 13.72 24.22 31.78 39.89


-- End of Chapter 5 --
























UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 8
CHAPTER 6
PROCEDURES THAT UTILIZE DATA FROM
THREE OR MORE RELATED SAMPLES

Frequently, we can greatly improve the ability to detect group differences in the variable of
interest by dividing subjects into homogeneous subgroups, called blocks, and then making
comparisons among subjects within the subgroups. We can do this by using randomized complete
block design (two-way ANOVA). This technique extends the two-sample paired comparison model
discussed in Chapter 4 to the case in which several samples are available for analysis. Thus, for
three or more samples, a block is composed of three or more subjects, more generally referred to as
experimental units, who are more homogeneous with respect to each other then with respect to
subjects in another block. We could form blocks on the basis of age, education and physical
condition. In certain situations a single subject may be a block.

6.1 Friedman Two-Way Analysis Of Variance By Ranks
This test is a nonparametric analogue of the parametric two-way analysis of variance. We
perform calculations on ranks, which may be derived from observations measured on a higher scale
or may be the original observations themselves. The procedure may be used when for one reason or
another it is undesirable to use the parametric two-way ANOVA. For example, the investigator may
be unwilling to assume that the sampled populations are normally distributed, a requirement for the
valid use of the parametric test. Also, in some cases only ranks may be available for analysis.
The objective is to determine if we may conclude from sample evidence that there is a
difference among treatment effects. We reason that if treatments do not differ in their effects, the
median response of a population of subjects receiving a given treatment will be the same as the
median response of a population of subjects receiving any one of the other treatments under study,
after the effect of the blocking variable has been removed. Thus, if we are comparing k treatments
that have identical effects,
1 2 k
M M M = = = L , where
j
M is the median of the population receiving
the jth treatment, and 1 j k .

Assumptions:
A. The data consist of b mutually independent samples (blocks) of size k. The typical
observation
ij
X is the jth observation in the ith sample (block). The data may be
displayed as in Table below, where the rows represent the blocks and the columns are
called treatments. The term treatment has a very general meaning; it may refer to a
treatment in the usual sense of the word, or it may refer to some other condition such as
socioeconomic status or educational level.
B. The variable of interest is continuous.
C. There is no interaction between blocks and treatments.
D. The observations within each block may be ranked in order of magnitude.

Hypotheses:

0 1 2
:
k
H M M M = = = L versus
1
: H At least one equality is violated.

Test Statistic:
Display the Friedman two-way ANOVA by ranks
Treatment
Block 1 2 3 L j L k
1

11
X
12
X
13
X
L
1 j
X
L

1k
X
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 9
2

21
X
22
X
23
X


2 j
X


2k
X
3

31
X
32
X
33
X

3 j
X
3k
X
M M
i

1 i
X
2 i
X
3 i
X


ij
X


ik
X
M M
b

1 b
X
2 b
X
3 b
X
L

bj
X
L

bk
X
Convert the original observations to ranks. (it is not necessary if the original observations
are ranks). The observations within each block are ranked separately from smallest to largest,
so each block contains a separate set of k ranks.
Treatment
Block 1 2 3 L j L k
1 1
R
2
R
3
R
L
j
R
L k
R
2 2
R
k
R
1
R

j
R
3
R
3
j
R
2
R
k
R
3
R
1
R
M M
i

1
R
2
R
k
R
3
R

j
R
M M
b

k
R
1
R
3
R
L
j
R
L 2
R
If
0
H if true and all treatments have identical effects, the rank that appears in a particular
column when the data are displayed in above Table is merely a matter of chance.
Consequently, when
0
H is true, neither small nor large ranks should tend to show a
preference for a particular column; that is, the ranks in each block should be randomly
distributed over the columns (treatments) in each block.
Obtain the sums of the ranks
j
R in each column. If
0
H is false, we expect at least one sum
to be sufficiently different in size from at least one other sum that we are reluctant to
attribute the difference to chance alone. The Friedman test statistic is defined as
( )
( )
( )
2
2 2
1
1
1
12
~
1 2
k
r j k
j
b k
R
bk k


=
( +
=
(
+


in which ( ) 1 2 b k + is the mean of the
j
R s under
0
H . A sufficiently large value of
2
r
will
cause rejection of
0
H .
The computational formula for the test statistic is
( )
( )
2 2
1
12
3 1
1
k
r j
j
R b k
bk k

=
= +
+


Alternatively, we may use
( )
( )
( )
2
2 2
2
1
2 2
12 3 1
1 1
k
j
j
r
R b k k
W
b k b k k

=
+
= =



Decision Rule:
Reject
0
H at the level of significance if ( ) , , W w b k P = where w obtain from Table
A14.
For values of b and/or k not included in Table A14, we Reject
0
H if
( ) ( )
2 2
1 , 1 r k


. (Table
A11).
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 10
1) Ties
Theoretically, no ties should occur, since the variable whose values are ranked is assumed to
be continuous. In practice, however, ties do occur, and we give tied observations the mean of the
rank positions for which they are tied. Note that only ties within a given block are of concern.
( )
( ) ( )
2
2 2
1
2 2 3
12 3 1
1
k
j
j
R b k k
W
b k k b t t
=
+
=



where t is the number of observations tied for a given rank in any block.
Note: if there is small number of ties, we use the same ranks. If there are many ties occurred, then
we use this formula.

Example 6.1:
Hall et al. compared three methods of determining serum amylase values in patients with
pancreatitis. The results are shown in Table below. We wish to know whether these data indicate a
difference among the three methods. (R1. pg. 265).
Method of determination
Specimen A B C
1 4000 3210 6120
2 1600 1040 2410
3 1600 647 2210
4 1200 570 2060
5 840 445 1400
6 352 156 249
7 224 155 224
8 200 99 208
9 184 70 227

2) Use of Aligned Ranks
The Friedman test is based on b sets of ranks, and the treatments are ranked separately in
each set. Such a ranking scheme allows for intrablock comparisons only, since interblock
comparisons are not meaningful. When the number of treatments is small, this may pose a
disadvantage. When situations arise in which comparability among blocks is desirable, the method
of aligned ranks may be employed.
Subtract from each observation within a block some measure of location such as the block
mean or median. The resulting differences, called aligned observation, which keep their
identities with respect to the block and treatment combination to which they belong, are then
ranked from 1 to kb relative to each other (the same as the Kruskal-Wallis Test).
If there is no treatment effect, we would expect each of the blocks to receive approximately
the same sequence of aligned ranks. We would expect the treatment rank totals to be about
equal. In the absence of ties, the aligned-ranks test statistic for the randomized complete
block design is
( ) ( )
( )( )
2
2
2
1
2
1
2
1

1 1
4
~
1 2 1
1

6
k
j
j
k
b
i
i
kb
k R kb
T
kb kb kb
R
k

=
(
+
(

=
( + +


where

i
R

= rank total of the ith block, and

j
R

= rank total of the jth treatment.


If ties are present, replace the denominator of T with
2 2
1 1 1
1

ij i
b k b
i j i
R R
k

= = =



UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 11
where

ij
R = the aligned rank of the jth measurement in the ith block.

Example 6.2:
In order to assess the effect of different amounts of cobalt (Co) on the tensile strength of steel,
researchers conducted an experiment employing a completely randomized experimental design. The
treatments consisted of four different levels (expressed as percentages) of Co, and the eight
crucibles in which the alloying process took place served as the blocks. The tensile strengths in
thousands of psi of the resulting 32 specimens of steel are shown below.
Treatment (% Co)
Block (crucible) A B C D
1 43.3 45.8 45.5 44.7
2 48.3 48.7 46.9 48.8
3 49.8 48.7 56.0 48.6
4 49.8 51.3 55.3 58.6
5 56.6 56.1 58.6 54.6
6 57.6 57.5 58.1 57.7
7 72.0 74.2 89.6 82.1
8 88.1 88.7 92.6 88.2

3) Multiple Comparison Procedure for Use with Friedman Test
Researcher are usually not satisfied to know simply that their data allow them to conclude
that not all sampled populations or all treatment effects are identical. For example, when the
application of the Friedman test leads us to reject
0
H , we are usually interested in exactly where the
differences are located. What we need, then, is a multiple-comparison procedure. When we
compare all possible differences between pairs of samples, when the experimentwise error rate is ,
and when the number of blocks is large, then we declare
j
R and
j
R

significantly different if
( )
( )
1
1
6
j j
k k
bk k
R R z

+

where
j
R and
j
R

are the jth and jth treatment rank totals, and z is a value from standard normal
distribution.

Example 6.3:
To illustrate the use of this procedure, let us consider again the data of Example 6.1. Since we
rejected
0
H , we wish to know specifically which methods are different from which others. Suppose
we choose an experimentawise error rate of 0.10 = . (R1. pg. 275)


6.2) Pages Test For Ordered Alternatives

Assumptions:
E. The data consist of b mutually independent samples (blocks) of size k. The typical
observation
ij
X is the jth observation in the ith sample (block). The data may be
displayed as in Table below, where the rows represent the blocks and the columns are
called treatments. The term treatment has a very general meaning; it may refer to a
treatment in the usual sense of the word, or it may refer to some other condition such as
socioeconomic status or educational level.
F. The variable of interest is continuous.
G. There is no interaction between blocks and treatments.
H. The observations within each block may be ranked in order of magnitude.
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 12

Hypotheses:

0 1 2
:
k
H = = = L
versus
1
: H The treatment effects
1 2
, , ,
k
L are ordered in the following way:
1 2 k
L .

Test Statistic:
1 2 3
1
2 3
k
j k
j
L jR R R R kR
=
= = + + + +

L
where , 1, 2, ,
j
R j k = K are the treatment rank sums obtained in the manner explained in the
Friedman test (Section 6.1).

Decision Rule:
Reject
0
H if ( ) , , L L k b from Table A17.

1) Large Sample Approximation
For large sample sizes, we use the test statistic
( )
( ) ( )
( )
2
2
3
1 4
~ 0,1
144 1
L bk k
z N
b k k k
(
+

=



Example 6.4:
Cromer, a researcher, reported the scores made by 36 children who performed a certain task as part
of an experiment. The children, matched by chronological age and sex, were divided into three
groups. Children in group 1 were congenitally blind, those in group 2 were sighted children who
performed the task blindfolded, and those in group 3 consisted of sighted children who performed
the task without visual obstruction. The results are shown below. We wish to test the null
hypothesis of identical results against the alternative that children in group 1 tend to score lower
than those in group 2, and that those in group 2 tend to score lower than those in group 3. (R1, pg.
280).
Age Sex Blind Blindfolded Seeing
5:07 F 0 0 0
6:00 M 0 8 1
6:04 F 0 0 8
6:06 M 0 0 8
6:11 F 1 2 0
7:09 F 8 8 8
7:11 F 8 5 8
8:00 F 8 6 8
8:05 F 0 8 8
8:06 F 8 8 8
8:10 F 8 3 8
9:06 M 8 8 8







UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 13
6.3) Durbin Test For Incomplete Block Designs
In designing an experiment, the investigator may find that it is impossible or impractical to
construct a randomized complete block design of the type discussed so far. It may be impossible or
impractical to apply all treatments to each block. This becomes an important problem when the
number of treatments is large and the size of the blocks is limited. For example, we are going to
compare the effects of seven treatments by administering the treatments to laboratory animals, with
litters serving as block. Because the subjects must meet certain criteria, we can use only three
animals from each litter. These conditions suggest that we use an incomplete block design, since we
cannot administer each treatment to an animal from each litter.
The particular type of incomplete block design with which we are concerned is the balanced
incomplete block design. In this design every possible pair of treatments appears the same number
of times. Further, the balanced incomplete block design requires that each block contain the same
number of subjects and that each treatment occur the same number of times.

Assumptions:
A. The blocks are mutually independent of each other.
B. The observations within each block may be ranked in order of magnitude.

Hypotheses:

0
: H The treatments have equal effects.
versus
1
: H The responses to at least one treatment tend to be larger than the responses to at least
one other treatment.

Test Statistic:
Display the data in a table similar to below (eg. Each block has 3 subjects and each
treatment occurred 3 times).
Treatment
Block A B C D E F G
1 X X X
2 X X X
3 X X X
4 X X X
5 X X X
6 X X X
7 X X X
Note: X = response of a subject in a given block to indicated treatment.
Rank observations within each block from smallest to largest.
Treatment
Block A B C D E F G
1 2
R
1
R
3
R

2 2
R
1
R
3
R

3 1
R
3
R
2
R

4 1
R
2
R
3
R
5 2
R
3
R
1
R

6 1
R
2
R
3
R
7 1
R
2
R
3
R
Assign tied observations the mean of the rank positions for which they are tied. A moderate
number of ties does not greatly affect the results.
UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 14
Test statistic:
( )
( )( )
( ) ( )
2 2
, 1
1
12 1 3 1 1
~
1 1 1
t
j t
j
t r t k
T R
rt k k k


=
+
=
+


where t = the number of treatments under investigation.
k = the number of subjects per block ( ) k t < .
r = the number of times each treatment occurs.

j
R = the sum of the ranks appearing under the jth treatment.

Decision Rule:
Reject
0
H if
2
, 1 t
T


> .

Note: the chi-square approximation is good only when r is large, and it should be realized that the
results are probably very crude when r is small.

Example 6.5:
Two university lecturers compared the toxicity of each of seven chemicals applied to Aphis rumicis,
a black aphid found on nasturtiums. The logarithm of the dose (+3.806) required to kill 95% of the
insects exposed to a chemical was the measurement reported. Since the experimenters could test
only three chemicals a day, they used a balanced incomplete block design requiring seven days for
completion of the experiment. The toxicities are shown below. We wish to know whether we may
conclude from these data that the effectiveness of the seven chemical differs. Use 0.01 = . (R1, pg.
286).


6.4) Cochrans Test For Related Observations
In some investigations that utilize the randomized complete block design, the response to a
treatment may take on only one of two values. We may arbitrarily designate these two possible
outcomes success or 1, and failure or 0. Cochran proposed a procedure for testing the null
hypothesis of equal treatment effectiveness in this situation, which is a problem of correlated
proportions. It is a generalization of McNemars technique discussed before to three of more
treatments. This test is known as Cochrans Q Test.

Assumptions:
A. The data for analysis consist of the responses of r blocks to c independently applied
treatments.
B. The responses are 1 for success or 0 for failure. The results may be displayed in a
contingency table, where the
ij
X s are either 0s or 1s.
C. The blocks are a random selection of blocks from a population of all possible blocks.

Hypotheses:

0
: H The treatments are equally effective.
versus
1
: H The treatments do not all have the same effect.

Test Statistic:
Construct the table
Treatment
Block 1 2 3 L c Block totals
1

11
X
12
X
13
X
L 1c
X
1
R

UECM3253: Applied Nonparametric Statistics UTAR: May 2013
Prepared by: Chang Yun Fah 15
2

21
X
22
X
23
X


2c
X
2
R

3

31
X
32
X
33
X
3c
X
3
R

M
r

1 r
X
2 r
X
3 r
X


rc
X
r
R

Treatment
totals

1
C
2
C
3
C
L

c
C
N=grand
total
Cochran points out that the total number of success in a given block is considered fixed.
The test statistic is
( ) ( )
2 2
1 2
, 1
2
1
1 1
~
c
j
j
c r
i
i
c c C c N
Q
cN R

=

=



Decision Rule:
First, delete all blocks containing only 0s or 1s.
If the product of the remaining blocks by the number of treatments is 24 or more, and the
number of blocks is at least 4, then
Reject
0
H if
2
, 1 c
Q


> .
If the product of the remaining blocks by the number of treatments is less than 24, then
construct the exact distribution or use special tables (not provided in this course).

Example 6.6:
Custafson and friends compared the abilities of three computer-aided diagnostic systems (called
models) and physician opinions (majority opinion) in diagnosing on the basis of symptoms,
physical signs, and laboratory information. Table below shows the results obtained with 11
hypotheyroid patients. To test the null hypothesis that the four diagnostic methods give the same
results, we use Chochrans Q test at 0.05 = .
Hypothyroid
patients
Majority
opinion
Actuarial
PIP
Subjective
PIP
Semi-
PIP
1 1 0 0 0
2 1 1 1 1
3 0 0 0 0
4 0 1 1 1
5 1 1 1 1
6 1 0 0 1
7 1 0 1 1
8 1 0 0 1
9 1 0 0 0
10 1 0 0 0
11 1 1 1 1



-- End of Chapter 6 --