Vous êtes sur la page 1sur 5

A Model for the Observer on the Farnsworth-Munsell

100-Hue Test
B. J. Craven

Purpose. To use a theoretical model of the observer on the Farnsworth- Munsell 100-hue test
to estimate the magnitude of random variation in 100-Hue test error scores.
Methods. The model was based upon classical signal detection theory. Results from the model
were obtained by computer simulation.
Results. There is a fairly regular relationship between mean test scores over many tests and the
standard deviation of those scores. This relationship is for practical purposes unaffected by
polarity in the observer's hue discrimination and by changes in the detailed assumptions of the
model.
Conclusion. The model provides a flexible tool for further theoretical research into the 100hue test. Invest Ophthalmol Vis Sci. 1993;34:507-511.

Xf a test of visual function, such as the FarnsworthMunsell 100-hue test1, is to have any great clinical or
scientific usefulness, it is important that test-to-test
variations in the subject's score on the test are small or
at least of known magnitude. In this context, "small"
means small compared to those variations that the experimenter is interested in detectingfor example,
those due to defective color vision.
There may be many causes of such test-to-test variations, but they can be broadly classified as being random errors, which cause test-to-test variations that do
not affect the mean score, or systematic (or constant)
errors, which bias the obtained score away from what
might be considered the "true" value. Both of these
kinds of error can affect the result of the FM 100-hue
test. Where testing is done once only, as in screening
procedures, it is the likely random error that is of interest; one would like to know the probability of a colorFrom the. Department of Psychology, University of Stirling, and the
Department of Communication and Neuroscience, University of Keele,
United Kingdom.
Submitted for publication: August 3, 1992; accepted October 14, 1992.
Proprietary interest category: N.
Reprint requests: Dr. li. j . Craven, Department of Psychology, University
of Stirling, Stirling FK9 4LA, United Kingdom.

Investigative Ophthalmology & Visual Science, March 1993, Vol. 34, No. 3
Copyright Association for Research in Vision and Ophthalmology

normal observer producing a score indicative of color


deficiency, and conversely, the probability of classifying a color-defective individual as color-normal. In
longitudinal studies involving repeated testing of the
same subjects (as when color vision testing is used to
monitor the progress of diseases such as diabetes),
practice effects may produce systematic errors, in the
sense that later tests will produce lower scores than
earlier tests, independently of any change in visual
function.2"4 This report is concerned with random
errors only, and in particular with determining the
variability in (ie, standard deviation of) 100-hue error
scores as a function of performance level.
This question has been addressed theoretically by
Victor,5 who obtained (by simulation) the relationship
between mean score and standard deviation of scores.
He modeled imperfect hue discrimination by starting
from perfectly sorted caps and then swapping randomly chosen pairs of adjacent caps. The performance
level was varied by varying the number of swaps in
each box. He assumed that because the swaps were
independent events, the number of swaps in each box
on a given test should be drawn from a Poisson distribution. The validity of the swapping procedure as a
method of producing realistic cap arrangements is

507

508

Investigative Ophthalmology & Visual Science, March 1993, Vol. 34, No. 3

open to some question. Victor's procedure chose pairs


of caps to be swapped entirely at random, arguing that
successive swaps are independent events. However,
consider as an example the following cap order: 12 3 6
5 4 7 8 9. If successive swaps are independent, the next
swap is just as likely to involve caps 4 and 7 as it is to
involve caps 1 and 2. However, the difference in hue
between caps 4 and 7 is greater than that between caps
1 and 2 (by a factor of 3, if we assume hue differences
are equal and additive), and a subject whose hue discrimination was uniform around the hue circle would
be more likely to swap caps 1 and 2 (resulting in an
error score of 12) than to swap caps 4 and 7 (resulting
in an error score of 16). Thus, we can expect the errors
produced by human subjects to be more evenly distributed than those obtained by independent swappings of
caps, with a resulting effect on the scores obtained.
Three cautions must be added. First, the very fact
that cap 4 has been moved up the order (giving us
evidence that the subject has misassessed the hue of
cap 4) makes us more confident that the subject
should want to swap it with cap 7 than we would have
been had cap 4 been in its rightful position (that swaps
only of nearest neighbors are allowed does not affect
the argument). In other words, the difference in the
probabilities of swapping caps 1 and 2 and caps 4 and
7 may not be as great as suggested above. The second
caution is that modeling incorrect cap arrangements
by repeated swapping of caps should not be taken as
implying that a real subject does it that way. However,
the example given does suggest that a real subject is
likely to distribute errors more uniformly than does
Victor's cap-swapping process. The final caution is
that the 100-hue test is such a complicated system that
calculating the effects (on score variability) of deficiencies in any simulation procedure will be very difficult and perhaps impossible.
One way to attempt to resolve doubts about the
validity of the cap-swapping procedure is to use a different method of generating incorrect cap arrangements and see whether similar results hold.
This report describes a theoretical model of the
subject on the 100-hue test that generates cap arrangements by explicitly modeling random errors in the
subject's encoding of the hue of each cap. This approach has been widely used in psychophysical research into vision. The model was used in computer
simulations to determine how the standard deviation
of scores on repeated 100-hue tests varies with mean
score. The results were found to differ only minutely
from those of Victor,5 increasing our confidence that
the result is correct. Good agreement also was found
between the model and empirical data.6 The results
were robust with respect to the detailed assumptions
of the model. The model also yielded a further result
not obtainable with the cap-swapping procedure

that the variability in scores is only slightly affected by


polarity in the subject's hue discrimination.
Theory of Random Errors on the 100-Hue Test
The Farnsworth-Munsell 100-hue test is designed to
measure subjects' ability to correctly discriminate
slightly differing hues. As a result, it has much in common with a wide range of psychophysical tasks used to
measure discrimination performance along other perceptual dimensions. Many different theoretical models have been used to try to account for the pattern of
subjects' behavior in standard psychophysical tasks.
One of the most widespread and influential models is
that of signal detection theory (SDT).7 The popularity
of SDT can be partly accounted for by the ease and
elegance with which it provides models for a very wide
range of different psychophysical tasks, using essentially identical assumptions in each case. To the author's knowledge, no previous attempt has been made
to apply the SDT approach to the 100-hue test.
Outline of Signal Detection Theory Approach
Whatever the particular psychophysical task, signal
detection theory makes the fundamental assumption
that somewhere within the subject, the relevant attribute of each of the stimuli is quantitatively represented
as the value of a variable (internal to the subject), and
that the subject's decision on a given trial is based
solely upon the values of these variables. The theory
assumes further that this quantitative representation is
subject to random errorsthat is, a given stimulus
may elicit a different internal value each time it is presented. In other words, the encoding of the relevant
stimulus dimensions is subject to noise. Thus, one stimulus that is physically greater than another stimulus
may on some occasions elicit a smaller internal value
and thus be perceived as being smaller. It is this noise
that limits performance in discrimination tasks and
sets what is commonly called the threshold for the discrimination in question. In all cases, the subject is assumed to apply afixeddecision rule to determine from
the values of the internal variables the response that is
most likely to be correct.
The same fundamental assumptions can be applied without modification to model a subject performing the 100-hue test. We assume that the hue of
each cap is represented by the value of an internal
variable, and that the subject sorts the caps according
to these internal representations. Because the internal
representation of hue is subject to random error, the
ordering produced by the subject will not necessarily
be the correct one, and the greater the magnitude of
the random error, the more haphazard the cap order
produced by the subject. An inevitable consequence
of this process is that subjects' scores on the test, being
determined in part by random processes, will show

509

Model 100-Hue Test Observer

random variation. Indeed, if a subject consistently ordered the caps in the same incorrect order, we would
be forced to conclude that although the subject's color
matching performance was abnormal, his or her hue
discrimination was very good. Imperfect hue discrimination necessarily implies variation in test scores from
test to test. The amount of random variation (we will
consider the standard deviation of a subject's test
scores) is likely to depend upon the absolute level of
performance. The following model was constructed
with a goal of determining the relationship between
the mean score over many tests and the standard deviation of the scores.

1001

10:

10

1000

mean error score

METHODS
We assume that the color of each cap i (where the cap
number i takes values from 1 to 85) is encoded by the
value of some internal 1-dimensional variable X; (although color space is three-dimensional, the cap sequence of the FM 100-hue test is one-dimensional). If
the subject encoded color perfectly, without random
error, the value of each x; would be i (this choice is
merely for convenience; any regular spacing would
do). With the introduction of random errors, each xx is
perturbed by the addition of a random variable r
drawn from a Gaussian distribution of zero mean and
standard deviation a. Values of r are drawn independently for each cap. The standard deviation a is a free
parameter; by varying a we can manipulate the absolute level of performance. It will be shown later that
the choice of a Gaussian distribution of r is not crucial.
On each test, each cap i is assigned a value x{, as
just described, and the caps within each box are sorted
into increasing order of X;. Because of the random
component of the Xj's, the sorted order may not be the
correct order. The sorted caps are then scored according to the Farnsworth convention, exactly as if a human subject had ordered the caps. By repeating this
procedure many times, adequately precise values of
the mean score and the standard deviation of scores
can be obtained.
On a computer, the above procedure was performed for many values of the internal standard deviation a to obtain means and standard deviations of
scores corresponding to a wide range of performance
levels.

100

i. The standard deviation of scores of the 100-hue


test as a function of mean score, as predicted by the model
described in the text.
FIGURE

using different values of a. Each point is the result of


10,000 simulated runs of the 100-hue test. The data
points fall on a slight curve. For scores of 100 or less,
the relationship could be conveniently summarized by
saying that the standard deviation of scores on the
100-hue test for a single consistent subject is close to
twice the square root of the mean score for that subject.

RESULTS

Effect of the Shape of the Internal Noise


Distribution
So far, it has been assumed that the random errors
with which the cap hues are encoded are normally distributed, but this need not be the case. To see whether
the assumption of Gaussian noise is crucial, the procedure was repeated, but with the Gaussian noise distributions replaced by distributions of different shapes.
The shapes used were a rectangular distribution, and
the highly asymmetric distribution of the product of
pairs of numbers drawn from a rectangular distribution, which rises abruptly to a sharp peak on the low
side and declines gradually on the high side. This
choice of shapes was atheoretical and arbitrary, but
the intention was to use distributions highly unlike the
original Gaussian distribution. Based on Figure 2, in
which results for all three distribution shapes are plotted on the same axes, the shape of the internal noise
distribution is not critical in determining the relationship between the mean and standard deviation of 100hue test scores.

Figure 1 shows the relationship between mean score


(on the x axis) and standard deviation of scores (on the
y axis). Both axes are logarithmic. Note that neither
axis represents an independent variable: The true independent variable is the internal standard deviation cr.
The different points on the graph were obtained by

Effect of Polarity in the 100-hue Plot


It is common for color-deficient subjects to distribute
their errors nonuniformly among the 85 caps in the
test. This nonuniformity was modeled by varying the
standard deviation a of the internal noise distribution

510

Investigative Ophthalmology 8c Visual Science, March 1993, Vol. 34, No. 3

100-

100i

2 10-

s 10-

Uniform error
Sine, 1 cycle
Sine, 2 cycles
Sine, 4 cycles

Gaussian
Rectangular
Rect squared

10

100

10

1000

mean error score

sinusoidally as a function of cap number. For a given


maximum value of a, which we will call <rmax, the function is

sin[27rtN/85]),

1000

mean error score

FIGURE 2. As for Figure 1, but with data plotted for three


different shapes of the subject's internal error distribution.
The relationship between mean and SD is not greatly affected by the different assumptions.

a =

100

(1)

where i denotes the cap number and N the number of


complete cycles of the sinusoid in 85 caps. Thus, the
minimum value of a was always zero. Values of N of 1,
2, and 4 cycles were used, and amax was varied to vary
the level of performance.
In Figure 3, the results for the three values of N
are plotted with data for constant a (as in Fig. 1). The
three curves for the sinusoidally varying errors lie
closely on top of each other and differ only slightly
from the curve for uniform errors. The relationship
between the mean and standard deviation of test
scores is not affected to any significant extent by polarity in the subject's hue discrimination.
Consistency With Empirical Data
The results in Figures 1, 2, and 3 show there is a regular relationship between the mean score on the FM
100-hue test and the standard deviation of the scores.
The relationship is robust with respect to the only true
free parameter of the modelthe shape of the internal noise distributionand with respect to nonuniformities in the subject's performance. But we do not
yet know whether the model provides a good prediction of any data we might collect from human subjects.
There are arguments both that we should expect
greater variability from human subjects and that we
should expect lesser variability from human subjects.
These arguments are covered in the remainder of this

FIGURE 3. As for Figure 1, but with data plotted for a subject


whose hue discrimination shows polarity. Data are plotted
for subjects whose hue discrimination varies sinusoidally by
1, 2, or 4 cycles around the 100-hue loop. Data for a subject
whose discrimination is uniform are plotted for comparison.
Polarity does not cause a large change in the data.

section, followed by a brief comparison with data from


human subjects.
The model of the subject assumes there are no
sources of random fluctuation of scores apart from
the inevitable random perturbations of the internal
encodings of the colors of the caps. In practice, this
assumption may be untenable. The subject's performance may be affected by motivational, physiological,
diurnal, or even (if the conditions are not well controlled) photometric variables. It would be very diffi100i
DO Empirical s.d.

Theoretical s.d

80-

60-

44

157 185 275 521 603 687 696 785


mean error score

4. Comparison of theoretical predictions of score


variability with empirical data from Chisholm.6 Data are
shown for Chisholm's nine subjects. The mean score for
each subject is given at the foot of each pair of columns. In
each case, the standard deviation has been predicted from
Figure 1 on the basis of the subject's mean score.
FIGURE

511

Model 100-Hue Test Observer

cult to quantitatively assess the effects of these factors.


All that can be said is that there are reasons to believe
the standard deviations of test scores obtained in practice may be higher than those suggested by theory.
However, it is possible that as well as suffering
from random errors of encoding, the subject also
makes systematic errors. For example, there may be
two caps that the subject finds essentially indistinguishable on the basis of hue and hence should order randomly from test to test. If, however, there is a detectable brightness difference between the caps, the subject may order them on the basis of brightness, without realizing he or she is doing so. The order chosen by
the subject for the two caps may be consistently wrong.
In such cases, the variation in test scores will be less
than that expected on the basis of theory.
Given that the theory provides neither an upper
nor a lower bound on the variability to be expected in
practice, it becomes worthwhile to look at empirical
data to see how well it conforms to the predictions of
the model. The study of the relationship between
mean score on the test and standard deviation of
scores is methodologically difficult (because of possible practice effects) and tedious. However, one such
study has been reported by Chisholm.6 Figure 4 shows
the actual standard deviations obtained by Chisholm
for each of nine subjects and the predicted standard
deviations obtained from the subjects' mean scores. A
two-tailed t-test was performed upon the fractional
differences between the predicted and empirical data
for each subject. The difference was not statistically
significant (t = 1.35, 8 degrees of freedom; P > 0.2).
Although the sample size is small, approximate calculation of the power of the test8 indicates that if the
discrepancy between theory and data were 20% (for
example), and the critical significance level was chosen
to be 0.05, the probability of correctly rejecting the
null hypothesis would be about 0.85. It is clear that the
standard deviations predicted by the model do not
differ dramatically from those found in practice.
DISCUSSION
Figure 5 shows Victor's5 predictions of error score
standard deviation plotted with the data from Figure
1. Any difference between the two sets of results is of
negligible practical significance. The two approaches
to modeling performance on the 100-hue test therefore can be regarded as lending support to each other.
Why the agreement should be so good is not clear.
There appears to be some underlying regularity in the
way that cap arrangements are converted to test scores
that determines the variability in scores almost independently of any assumptions made in the models.
As well as confirming an existing result, the signal
detection theoretical model of the 100-hue test ob-

100

10:

Craven
Victor
10

100
mean error score

1000

FIGURE 5. As for Figure 1, but with data from Victor5 plotted


for comparison.
server is valuable in itself. It goes beyond Victor's
model in that it models the processes within the subject that limit performance, rather than using an arbitrary method of generating cap arrangements. It
yields the result that the variability in scores is not
affected by polarity in the distribution of errors. Finally, the techniques involved have had a very successful history in other areas of perceptual research and
should provide a useful and flexible tool for further
research into the 100-hue test.
Key Words
Farnsworth-Munsell 100-hue test, random errors, signal detection theory.
References
1. Farnsworth D. The Farnsworth-Munsell 100-Hue
Test for the Examination of Colour Discrimination:
Manual. Baltimore: Munsell Color Company; 1957.
2. Breton ME, Fletcher DE, Krupin T. Influence of serial
practice on Farnsworth-Munsell 100-hue test scoresthe learning effect. Applied Optics. 1988; 27:10381044.
3. Fine BJ, Kobrick JL. Field dependence, practice, and
low illumination as related to the Farnsworth-Munsell
100-hue test. Percept Mot Skills. 1980; 51:1167-1177.
4. Fine BJ. Farnsworth-Munsell 100-hue test and learning: Reestablishing the priority of a discovery. Applied
Optics. 1990; 29:186.
5. Victor JD. Evaluation of poor performance and asymmetry in the Farnsworth- Munsell 100-hue test. Invest
Ophthalmol Vis Sci. 1988; 29:476-481.
6. Chisholm IA. An evaluation of the Farnsworth-Munsell 100-hue test as a clinical tool in the investigation
and management of ocular neurological deficit. Transactions of the Ophthalmological Society of the United Kingdom. 1969; 89:243-250.
7. Green DM, Swets JA. Signal Detection Theory and Psychophysics. New York: Wiley; 1966.
8. Howell DC. Statistical Methods in Psychology. 2nd ed.
Boston: Duxbury Press; 1982.

Vous aimerez peut-être aussi