Birenbaum, M.

Studies in Educational Evaluation. Vol. 12, pp. 159-168, 1986 0191-491Xl86 $0.00 + .
50
Printed in Great Britain. All rights reserved. Copyright © 1986 Pergamon Journals Ltd.
" H O W " - - B E Y O N D THE " W H A T " , T O W A R D S THE

" W H Y " : A R U L E - A S S E S S M E N T APPROACH TO
A C H I E V E M E N T TESTING
Menucha Birenbaum
Tel A viv University, Israel
The issue of cognitive diagnosis, which has been ignored by the main stream
of research psychology for a long time, is beginning to receive much attention
in recent years. Cognitive psychologists, Artificial Intelligence (AI) expert~
and psychometricians are trying to model the thought process of individuals with
respect to how they solve a particular task. Needless to say that this shift in
focus from product oriented assessment toward a process oriented one, meant to
infer concerning the mental process underlying the subject's performance on the
task, has important implications for one of the most crucial issues in measure-
ment, namely, test validity. As pointed out by Embretson (1985), "The tradi-
tional approach to psychological measurement has had neither the theoretical
foundation nor the psychometric models that could specify explicitly those sub-
stantive qualities of test stimuli or of persons that underlie responses" (p.xi).
Among the recent developments in cognitive psychology which are aimed at

cognitive diagnosis one should mention the methods for identifying the impact of
theoretical variables in a task (i.e. task composition methods). For applica-
tions of these methods to the study of different abilities see Sternberg
(1977a,b, 1985) for verbal ability; Pellegrino, Randal and Shute (1985) for
spatial ability; Pellegrino & Glaser (1980), Butterfield et al. (1985) for
inductive reasoning. Among the new trends in psychometrics which utilize Item
Response Theory (IRT) for diagnosis, one should mention the Multicomponent
Latent Trait model (Whitely, 1980, 1981; Embretson, 1985) and the Rule Space
approach (Tatsuoka, 1982, 1983).
It seems that the process oriented approach has a great potential in the
area of educational assessment. It is true that the term "diagnosis" is not
novel in educational circles. Educators have been engaged in diagnosis for many
years. Textbooks for training student teachers stress the importance of diag-
nosis for prescribing remedial instruction. A variety of commercial diagnostic
tests are available on the market. However, most of the references made to
diagnosis in education are to "deficit analysis", i.e. identifying areas of
weakness on the part of the student with respect to various topics of the sub-
ject matter (Bejar, 1984).
As pointed out by Burton (1981) one can specify four levels of diagnosis.
The simplest one is concerned with determining whether or not a student has
mastered a skill. The degree of mastery is represented by a numeric value. A
159
160 M. Birenbaum
more complex level of diagnosis is aimed at determining what subskills the

student has not mastered, i.e. deficit analysis. The next level of diagnosis is
concerned with determining a set of incorrect rules that can reproduce results
equal to the students' results. As stated by Burton with respect to this level,
"we require a diagnosis that it be able to predict, not only whether the answer
will be incorrect, but the exact digits of the incorrect answer on a novel
problem" (Burton, 1981, p.4). While this level of diagnosis is concerned with
the "how", the next higher level is concerned with the "why", that is, the
causes which led the student to develop his/her incorrect procedures. A partial
answer to this question from an AI perspective has recently been suggested by
Van Lehn (1983).
According to Glaser (1981), the future of testing lies in the higher levels
of diagnosis. As he states: "An important skill of teaching is the ability to
synthesize from a student's performance an accurate picture of the misconcep-
tions that lead to error. This task goes deeper than identifying incorrect ans-
wers and pointing these out to the student: it should identify the nature of
the concept or rule that the student is employing that governs her or his per-
formance in some systematic way" (Glaser, 1981, p. 926).
Recent developments in Artificial Intelligence have advanced the higher

levels of diagnosis in the area of basic arithmetic skills. Among the automated
systems for cognitive diagnosis one should mention the BUGGY system (Brown &
Burton, 1978), which utilizes procedural networks for diagnosing "bugs" in addi-
tion and subtraction operations, the LMS system (Sleeman & Smith, 1981), which
diagnoses algebra behavior and the DPF (Diagnosis Path-Finder) (Ohlesson and
Langley, 1985), which is an AI system that uses a heuristic search to hypothe-
size a solution path to account for a subject's behavior. This system has re-
cently been applied to diagnose multi-column subtraction problems.
It should be noted however, that diagnosis of the rule assessment type is

not restricted to arithmetic skills only. Recent work on the study of syste-
matic errors in writing is aimed at identifying the rules the inadequate writer
employs that produce mistakes (Bartholomae, 1980; Hayes & Flower, 1980;
Shaughnessy, 1977a,b). A methodology for rule assessment in children's know-
ledge of scientific concepts has been studied by Siegler (1976,1978). His work
is based on the assumption that human reasoning is rule governed, with rules
progressing in sophistication as a function of age and learning. His methodo-
logy for rule assessment involves the design of diagnostic sets of problems that
yield distinct performance patterns.
In spite of all these impressive developments, the day by day practice in

the classroom has not yet been affected. Teachers are still using the tradi-
tional methodology of analysing test results with respect to the final product.
Total scores, based on .the number of correct answers, are still the rule rather
than the exception in achievement assessment.
Surveys on teachers' attitudes toward tests and test usage suggest that
more knowledge is associated with more positive attitudes toward tests and,
given the resources, increased test use. Moreover, the quality of instruction
improves when teachers use direct and frequent measurement strategies (Tollefson
et al., 1985). It is therefore important to introduce teachers to the advan-
tages of diagnostic assessment and thereby affect the quality of instruction.
As stated by Glaser (1981), "Rule assessment approaches assume that conceptual
development can be thought of as an ordered sequence of learned, partial under-
standings. Because individuals acquire concepts in subject-matter domains to
various levels of understanding in a reasonably predictable fashion, the assess-
ment of levels of knowledge can be linked to appropriate instructional
Rule Assessment Approach 161
decisions" (Glaser, 1981, p. 930).
The purpose of the present paper is to illustrate the relative efficiency

of an interpretation based on a rule assessment approach as compared to the
traditional interpretation of test results on a specific mathematical task. The
methodology used in this paper is simple and can be adopted by the classroom
teacher, provided that a systematic set of problems is created that can be re-
lated to the model of the task performance. The methodology of the test design
is described elsewhere (Birenbaum & Shaw, 1985; Klein, Birenbaum, Standiford and
Tatsuoka, 1981; Tatsuoka, 1984).
METHOD
The Data-Set
For the purpose of simplicity and clarity simulated data were used in the
present study. Twenty-five response vectors on a i0 item fraction addition test
were generated on the basis of "bugs" identified in previous empirical studies
which consisted of large data sets collected in the United States and in Israel
(Birenbaum & Shaw, 1985; Tatsuoka, 1984).
In order to eliminate noise, careless errors were not included in the simu-
lation. Moreover, inconsistencies due to strategy changes during the test,
which are quite common in real life testing, were not considered, neither were
reducing errors at the final stage. This was done in order to enable a coherent
presentation of the method of analysis and of the interpretation of the test
results.
The Test
The i0 items fraction addition test used in this study consisted of pairs
of parallel items: two pairs of mixed fraction items, one with like and the
other with unlike denominators, and 3 pairs of simple fractions, one with like
and two with unlike denominators, of which one had like numerators.
The items were adopted from a larger test designed for diagnostic purposes,
which was used for the empirical data collection upon which the present simula-
tion is based. The original test consisted of 48 open-ended items (Klein et al,
1981). A Task Specification Chart (TSC) which included content and procedural
facets was used for the test design (Birenbaum & Shaw, 1985).
Analysis
A "bug library" which includes "bugs" that were hypothesized on the basis
of the TSC and confirmed in previous empirical studies was used f~or coding
response-patterns. (The list of error codes used in the present study appears
in Exhibit i). The S-P chart technique (Sato, 1975) was used for presenting
test results, with error codes substituting for incorrect responses.
An S-P chart is a binary student by item (row x column) matrix in which the
students have been arranged in descending order of their total scores and the
items in ascending order of difficulty. The S curve is the step-function ogive
of the cumulative distribution of test scores for the group of students. The P
curve is the counterpart for the group of items.
Sato has suggested two indices for maximizing the information from test
results. One is the caution index (CI) and the other - the disparity
162 M. B~enbaum
EXHIBIT i: List of the Error Codes in the S-P Chart
Code Algorithm
A ab + e (b+e)
c d~ = (a+d) + (c+f)
Q ab + e ac+b df+e (ac+b)+(df+e)

c d~ = c + "T -= c+f
B b+e_= be
c f cf
C -b + -e = -b +
-e
c f cf
D ~ + ~ = b+_~e
c f LCD
b e b
E c + f = iff b=e c+--"f-
F iff a and c have a common f a c t o r y
ab a c
~=~;~
G b
-- + -e = -b-f
c f ce
h e bf+ef
}t (iff c=f) ~ + ~ =
f
I a~ " ab+c
c
J I + A = J
b e - ab+..__.~c + d e + f (ab+c)+(de+f)
a + d~ - c f = c+f
Z b + e = b+f
c f c+e
L same as K but always numerator denominator
iff (b+f) ] (c+e) = c+e

b+f
H ab + e = a+b+c
c d~ d+e+f
N ab + e a b d e = a+b+d+e
c d~ = ~- + c + 1 + f a+c+l+f
R b + e = bf+ec = bf+ec
c f cf 2cf
b +e b e
W T + A =W
c+f
U --
LCD
V a~ = ac+b + a; e df÷e
c ~ d ~ = ~ -'-+ d =
a~ + e ((ac+b)+(df+e))
c d~ = (a+d) c+f
y ab e a+b+c d+e+f
~+df = c + f
Z Y + A = Z
coefficient (D*). The caution index indicates the amount of appropriateness of

a given response pattern (for a student or for an item). A high value of this
index for a student indicates an aberrant response pattern in which easy items
were missed whereas more difficult ones were correctly answered. An ideal res-
ponse pattern is one where all the correct answers lie to the left of the S
curve. Similarly for an item, a high value of the CI index for a student indi-
cates that low scores on the test tend to answer that item correctly whereas
high scorers tend to miss it. Statistically the student's CI is defined as the
complement of the ratio of two covariances. The numerator is the covariance of
an observed response vector and the column-sum vector of the data matrix. The
denominator is the covariance of the Guttman vector with the same total score as
the observed vector and the column-sum vector.
The D* coefficient indicates the amount of disparity between the S and P

curves in the observed data matrix. A coefficient of zero indicates a full
Guttman scale. Statistically this index is defined as the ratio of the observed
area between the S and the P curves to the area under the binomial distribution,
given the observed marginal distributions. (For more details about the S-P
chart see Birenbaum, 1983; Harnisch, 1983; Sato, 1980).
RESULTS AND DISCUSSION
The resulting S-P chart is presented in Figure i. As can be seen in the

chart about a third of the response vectors resulted in a total score of zero,
one eighth scored 10%, three tenths scored 30%-40%, one eighth scored 60%-70%
and one eighth yielded a perfect score of 100%. From a conventional point of
view the fact that 75% of the students failed the test with a mean total score
of 16% may lead to the conclusion that the majority of the students do not know
how to add fractions and should be introduced once again to the topic. This
conclusion is based on the assumption that all students with the same total
score are alike with respect to their knowledge of the subject matter. More-
over, a score of zero implies that they know nothing about the subject matter
(i.e., "tabula rasa"), and remedial instruction should start from the begin-
ning. However, examining the response patterns from a rule-assessment prespec-
tive may lead to somewhat different conclusions. A close examination of the
response patterns that resulted in a total score of zero reveals that they rep-
resent different wrong algorithms for solving fraction addition problems, with
varying degrees of seriousness. The eighth student (I~#8) adds numerators and
multiplies denominators for like as well as for unlike fractions. The ninth
student (ID#9) multiplies numerators and denominators throughout all the test
items. Student #i0 cross adds throughout the entire test. Student #ii, like
the former student, cross adds but always places the smaller number in the nume-
rator and the larger one in the denominator. Student #12 always adds all the
parts of the first fraction (or the mixed number) for the numerator of the
result and all the numbers of the second, for the denominator of the result.
Student #13 has a second order "bug": (S)he adds corresponding parts after con-
verting all the whole numbers to improper fractions. Student #14 adds fractions
by adding their corresponding parts. In the case of mixed numbers this student
adds all the parts of the first mixed number for the numerator and all the ones
of the second for the denominator. The fifteenth student (I~#15) also has a
second order "bug". (S)he adds corresponding parts and, in addition to that,
incorrectly converts a mixed number to an improper fraction by adding all the
numbers for the numerator and retaining the original denominators. Student #16
correctly finds the lowest common denominator and the equivalent fractions, but
ends by adding corresponding parts. After identifying the rules of operation
underlying these response patterns one doubts whether the conventional interpre-
tation concerning this group of students still holds and whether the prescribed
instruction for all of them should be the same.
164 M. Birenbaum
SIUDEN r TE~T CAIJ T | Otl

NO, SCO E ltl~( I S GI~ 4 n 2 6 lO
( RAUl (Pt~C)
;~.0 10 lOO 0 A 4t. + 4. I- 4" 41. I" ~ ~ 4-~
7X
22 lO
IC 1CO
100
0 A
0 A • ~. ~. I. • i ~,- ~ ~.
t
6 7 70 • 13 r)
17 6 60 ,31 E
19 6 60 1,00 F
," ," .~'::IF"
1 q 40 0 C,
2 ~ qO 0 • ~ ~ • O D O O 0
~
3 ~ qO 0 G + + ~ A A ~, A A A
~ /~0 0 G + + + ~*C C C C
~ ~0 0 G + + ".' ~E E C C F I:
18 ~ ~0 0 G
7 3 30 0 ¢, ! A ~ A,1J
~3 1 lo n
2~ I IO 0 n
Z5 1 10 ,~0 It
B 0 0 0 r; C C "- ,~ C C C r.
q 0 0 0 G B A r~ B ~ BB n
tl 0 0 o G ~; ~ ~ I. L L t I( K
1~ 0 0 0 C
13 0 0 0 n A A ~t ~t A A A A N H
I~ o o O G A A M H A A .A A fl M
15 o 0 n G At., Z Z A A A ~" Z Z
!.~ 0 0 0 G A A A A R t~ R R ~ R
................................. °. . . . . . . . . . . . . .
PROnLEH t~t~, 3 ~ • B 1 ~ ~ 6 9 I0
...................................................
CORRECT . ~ t 4 S ~ P ~ ( K E Y | I L [ 1 L I L I L I
NCell~ ST U(~[H tS 14 I2 6 b tt
GET I' [ r i g CORI~| CT 1-," ? ~ 6 4
PASSTF~r. PAT[ (PRC) ~C, 4q ~4 ~ 16

't~ 3G Z~ Z4 Ib
CAUTIQtl ItJP[ ~ , 0 7 0 ~ ll. l l 0 0 0 0"03,03
CAUTXDht SIGIIAL tl g g g Z 2 [ Z Z Z
IIEM-TO[ CCRR ( R P f f I S ) ,7 °6 ,~ ,3 ,S
HISS IFIG ANSWERS 0 0 0 '3 0

0 0 0 0 0
StJPHAP.Y
AVERAGE RA~ SCORE 3.16 (31.60~:)
$,D. OF RAW SCORE
eI~eARZTY COEFFZ tE,r 3.21! z4)
~ELIABZLITY CngF~ICIgtlr~;
ALPIIA CRQ/Ifi ACII ,91
SPL[ T - H A L r S P[ A~ HAFI-IX~OItI! ,09
KLP'tA .90
STUoEtIT PRUQLEH
s'"':" ill
g~r.o. ~,.I t8II 1~1~[I :~ :~
FIGURE I: S-P Chart
An examination of the response patterns that resulted in a total score of

10% on the test reveals that the right answer of students who scored 10% was
produced by incorrect algorithms. Student #23 has a second order "bug". (S)he
appends I when adding fractions. In mixed number items (s)he correctly converts
mixed numbers to improper fractions but at the end adds corresponding parts.
Student #24 correctly finds the lowest common denominator but adds the original
denominators for the numerator of the result. The last student in this group
(ID#25) also has a second order "bug". (S)he correctly converts a mixed number
to an improper fraction but retains the original whole number. In addition to
that (s)he adds corresponding parts.
Similarly, a diagnostic perspective clarifies the distinction between

students with higher scores on the test. Student #17 and Student #19 score
identically (60%). However, their misapprehensions are different. Student #17
correctly solves fraction addition items. His/her "bugs" occur in converting a
mixed number to an improper fraction, in which case (s)he adds all the numbers
of the mixed number for the numerator, retaining the original denominators.
Student #19 misses all the items with like fractions while correctly answering
the other item types. This student always calculates equivalent fractions. In
the case of like denominators (s)he retains the original denominator after
having calculated equivalent fractions. It should be noticed that the caution
index (CI) provides information about the distinction between these two stu-
dents. Student #19 has the highest CI, indicating a completely unpredictable
response pattern. This is due to his/her trouble with items of like denomina-
tors, which caused his/her failure in items that are considered less complicated
than items with unlike denominators.
As to student #6 who got a total score of 70% on the test, his/her troubles
are with converting mixed numbers to improper fractions. (S)he multiplies the
whole number by the numerator and adds the denominator. Note that this student
gains an extra 10% on the test score because item #7 (whose whole number is i)
is not sensitive to this "bug".
For reasons of completeness, following is the rule-assessment for the rest

of the students in the chart, who scored 30%-40% on the test. All these stu-
dents correctly solve items with like denominators. Their mistakes occur in
items with unlike denominators. The first student (ID#1) multiplies numerators
and denominators. Student #2 correctly finds the lowest common denominator but
forgets to find equivalent fractions. Instead (s)he adds the original numera-
tors. The third student (ID#3) adds corresponding parts. The fourth student
(ID#4) adds numerators and multiplies denominators. Student #5 has a higher
order "bug". Basically (s)he does the same as the former student except for two
cases. In problems with like numerators (s)he writes down the original numera-
tor and adds the denominators. In cases of mixed numbers where the whole number
and the denominator have a common factor this student incorrectly cancels (i.e.
3 1/6 = 1 1/2). Student #18 correctly solves problems with like denominators.
However, in problems with unlike denominators this student cross multiplies. It
should be noted that in spite of the differences in their "bugs", which reflect
varying degrees of misapprehensions, all these students got the same total score
on the test. Moreover, these scores would not be affected by weighting the
items since all the students missed the same items on the test. The seventh
student (ID#7), who answered 30% of the items correctly, has a higher order
"bug". (S)he converts a mixed number to an improper fraction by multiplying the
whole number by the numerator and adding the denominator. In addition to that
(s)he also adds corresponding parts in items with unlike denominators. The fact
that this student got a correct answer on item #7 is due to the fact that this
item, having a whole number of i is not sensitive to this "bug".
166 M. Birenbaum
Note that the overall picture of the test from a conventional psychometric
perspective is one of a difficult but highly reliable test. All its items fall
into the low CI category: six are classified as having a high difficulty level
and four, a medium difficulty level. The disparity coefficient is relatively
low, indicating a hierarchical structure of the task.
SUMMARY AND CONCLUSION
The advantages of a rule assessment approach to the interpretation of

achievement test results have been demonstrated using an S-P chart with coded
error types. The problems of similar total test scores resulting from com-
pletely different misapprehensions, as well as correct answers resulting from
incorrect rules of operation, were addressed using a simulated data-set.
Although the overall quality of the test used here as measured by conven-
tional psychometric indices proved satisfactory, it was shown that the tradi-
tional interpretation, which refers to total test scores, can be misleading,
especially when adaptive remediation is sought. It is well known in medical
sciences that a disease has several symptoms yet several diseases can share the
same symptoms (i.e. high fever). Consequently, no responsible physician would
prescribe the same medicine for two patients suffering from different diseases
just because they both share high fever as one of their symptoms. Similarly,
when two students with different misapprehensions get the same total test score,
should the teacher prescribe the same remediation for correcting their misappre-
hension?
Although the method for diagnostic test construction was out of the scope
of this paper, it should be noted that test design is a crucial matter which
eventually determines the quality of the diagnosis. One has to, therefore,
carefully choose the items for the diagnosis in order to maximize the informa-
tion about the rules of operation underlying the students' responses. A task
specification chart (Birenbaum & Shaw, 1985) may serve as a useful tool in the
process of test construction. As was illustrated in the chart, when an item
yields the same results as a result of various "bugs", its contribution to rule
assessment is in question.
Although in reality test results are contaminated by noise resulting from

careless errors or strategy changes during the test, the overall identification
rate achieved by diagnostic tests ranges between 70%-80% (Tatsuoka, 1984).
Similarly, current AI diagnostic systems such as DEBUGGY and DPF are reported as
being capable of identifying 80%-90% of student errors (VanLehn, 1981; Ohlesson
& Langley, 1985). It seems that such a rate justifies the tedious work involved
in constructing a diagnostic tool.
REFERENCES
BARTHOLOMAE, D. (1980). The study of error. College Composition and Communi-

cation, 31, 253-269.
BEJAR, I.I. (1984). Educational diagnostic assessment. Journal of Educational
Measurement, 21, 175-189.
BIRENBAUM, M. (1983). Feedback from tests: Analyzing test results with a S-P
chart and related indices. Tel-Aviv University, School of Education, Unit for
Measurement and Evaluation Studies. (in Hebrew).
BIRENBAUM, M. and SHAW, D.J. (1985). Task specification chart -- a key to a
better understanding of test results. Journal of Educational Measurement (in
press).
BROWN, J.S. and BURTON, R.B. (1978). Diagnostic models for procedural bugs in
basic mathematical skills. Cognitive Science, 2, 155-192.
BURTON, R.R. (1981). Diagnosing bugs in a simple procedural skill. Palo Alto,
CA. XEROX Palo-Alto Research Center.
BUTTERFIELD, E.C., NIELSEN, D. TANGERN, K.L. and RICHARDSON, M.B. (1985).
Theoretically based psychometric measures of inductive reasoning. In: S.E.
Embretson (Ed.) Test design, developments in psychology and psychometrics.
New York, Academic Press.
EMBRETSON, S.E. (Ed.) (1985). Test desi~n~ Developments in psychology and
psychometrica. New York, Academic Press.
GLASER, R. (1981). The future of testing: A research agenda for cognitive
psychology and psychometrics. American Psychologist, 36, 923-936.
HARNISCH, D.L. (1983). Item response patterns: Application for educational
practice. Journal of Educational Measurement, 20, 191-206.
HAYS, J.R. and FLOWER. L. (1980). Identifying the organization of writing
processes. In L.W. Gregg and E.R. Steinberg (Eds.) Cognitive process in
writing. Hillsdale, N.J.: Erlbaum.
KLEIN, M., BIRENBAUM, M., STANDIFORD, S.M. and TATSUOKA, K.K. (1981). On the
construction of an error-diagnostic test in fraction arithmetic. Technical
Report 81-6 NIE Urbana: University of Illinois, Computer-based Education
Research Laboratory.
OHLESSON, S. and LANGLEY, P. (1985). Identifying solution paths in cognitive
diagnosis. Technical Report CUM-RI-TR-85-2. The Robotic Institute. Carnegie-
Mellon University.
PELLEGRINO, J.W., and GLASER, R. (1980). Components of inductive reasoning. In
R.E. Snow, P.a. Federico, and W. Montague (eds.) Aptitude~ learning and
instruction: Vol. i. Cognitive process analyses of aptitude. Hillsdale,
N.J. Erlbaum.
PELLEGRINO, J.W., RANDAL, J.M. and SHUTE, V.J. (1985). Analyses of special
aptitute and expertise. In: S.e. Embretson (Ed.) Test design, Developments
in psychology and psychometrics. N.Y. Academic Press.
SATO, T. (1975). The construction and interpretation of S-P tables. Tokyo:
Meiji Tosho (in Japanese).
SATO, T. (1980). The S-P chart and the caution index. NEC educational informa-
tion bulletin Computer and Communication Systems Research Laboratories, Nippon
Electrical Co. Ltd.
SHAUGHNESSY, M. (1977a). Errors and expectation. New York: Oxford University
Press.
SHAUGHNESSY, M. (1977b). Some needed research on writing. College composition
and communication, 28, 317-321.
SIEGLER, R.S. (1976). Three aspects of cognitive development. Cognitive
Psychology, 8, 481-520.
SIEGLER, R.S. (1978). The origns of scientific reasoning. In R.S. Siegler
(Ed.), Children's thinking: What develops? Hillsdale, N.J.: Erlbaum.
SLEEMAN, D.H. and SMITH, M.J. (1981). Modeling students' problem solving.
Artificial Intelli~ence, 16, 171-187.
STERNBERG, R. (1977a). Component process in analogical reasoning. Psychological
Review, 89, 353-378.
STERNBERG, R.J. (1977b). Intelligence~ information processin~ and analogical
reasoning: The componential analysis of human abilities. Hillsdale, N.J.:
Erlbaum.
STERNBERG, R.J. and McNAMARA, T.P. (1985). The representation and processing of
information in real-time verbal comprehension. In: S.E. Embretson (Ed.) Test
design: Developments in psychology and psychometrics. N.Y.: Academic Press.
TATSUOKA, K.K. (1982). Rule space, the product space of two score components in
signed-number subtraction. An approach to dealing with inconsistent use of
erroneous rules. Urbana, Ii. Computer-based Educational Research Laboratory.
University of Illinois at Urbana-Champagin.
168 MoBirenbaum
TATSUOKA, K.K. and TATSUOKA, M.M. (1983). Spotting erroneous rules of operation
by the individual consistency index. Journal of Educational Measurement, 20,
221-230.
TATSUOKA, K.K. (1983). A propabalistic model for diagnosing misconceptions by a
pattern classification approach. Research Report 83-4-ONR. Computer-based
Educational Research Laboratory. University of Illinois at Urbana-Champaign.
TATSUOKA, K.K. (1984). Analysis of errors in fraction addition and subtraction
problems (Final Report NIE-G-81-0002). Urbana: University of Illinois,
Computer-based Education Research Laboratory.
TOLLEFSON, N., TRACY, D.B., KAISER, J., CHEN, J.S., and KLEINSASSER, A. (1985).
Teachers attitudes towards tests. Paper presented at the annual meeting of the
American Educational Research Association in Chicago, Illinois.
VANLEHN, K. (1981). Bugs are not enough: Empirical studies of bugs, impasses
and repairs in procedural skills. Technical Report, CIS-II. Palo-Alto, CA.
XEROX, Palo Alto Research Center.
VANLEHN, K. (1983). Felicity conditions for human skill acquisition: Validating
an Al-based theory. Research Report CIS-21. Palo-Alto, CA. XEROX, Palo-Alto
Research Center.
WHITELY, S.E. (1980). Multicomponent latent trait models for ability tests.
Psychometrika, 45, 479-494.
WHITELY, S.E. (1981). Measuring aptitude processes with mult~component latent
trait models. Journal of Educational Measurement, 18, 67-84.
THE AUTHOR
MENUCHA BIRENBAUM is a lecturer at Tel Aviv University School of Education in

the Unit of Evaluation Research and Measurement. Her main field of interest is
measurement research methodology.

Birenbaum, M.

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Birenbaum, M.

Transféré par

Droits d'auteur :

Formats disponibles

Studies in Educational Evaluation. Vol. 12, pp. 159-168, 1986 0191-491Xl86 $0.00 + .

" H O W " - - B E Y O N D THE " W H A T " , T O W A R D S THE

Among the recent developments in cognitive psychology which are aimed at

more complex level of diagnosis is aimed at determining what subskills the

Recent developments in Artificial Intelligence have advanced the higher

It should be noted however, that diagnosis of the rule assessment type is

In spite of all these impressive developments, the day by day practice in

decisions" (Glaser, 1981, p. 930).

The purpose of the present paper is to illustrate the relative efficiency

Q ab + e ac+b df+e (ac+b)+(df+e)

F iff a and c have a common f a c t o r y

L same as K but always numerator denominator

iff (b+f) ] (c+e) = c+e

coefficient (D*). The caution index indicates the amount of appropriateness of

The D* coefficient indicates the amount of disparity between the S and P

RESULTS AND DISCUSSION

The resulting S-P chart is presented in Figure i. As can be seen in the

SIUDEN r TE~T CAIJ T | Otl

PASSTF~r. PAT[ (PRC) ~C, 4q ~4 ~ 16

CAUTIQtl ItJP[ ~ , 0 7 0 ~ ll. l l 0 0 0 0"03,03

HISS IFIG ANSWERS 0 0 0 '3 0

An examination of the response patterns that resulted in a total score of

Similarly, a diagnostic perspective clarifies the distinction between

For reasons of completeness, following is the rule-assessment for the rest

SUMMARY AND CONCLUSION

The advantages of a rule assessment approach to the interpretation of

Although in reality test results are contaminated by noise resulting from

BARTHOLOMAE, D. (1980). The study of error. College Composition and Communi-

MENUCHA BIRENBAUM is a lecturer at Tel Aviv University School of Education in

Vous aimerez peut-être aussi