Vous êtes sur la page 1sur 36

9.

0 Improving a Classroom-
Based Assessment Test
Assumptions:
You have known
• How to plan a classroom test by:
– stating the purpose for its construction
– Specifying the learning outcomes to be assessed
– Preparing a test blueprint (ToS)
• The techniques and strategies for selecting
and constructing different item formats to
match the ILOs.
• The next phase is ensuring that the
instrument is valid by reviewing and
improving the items.
Two Approaches
1. Judgmental approach
 Teacher’s own review
a. Adherence to writing guidelines
b. Contribution to scored-based inference (test can contribute to
making valid inference about the learners.
c. Accuracy of content
d. Absence of content gaps (not missing an important content
prescribed by a curriculum standard)
e. Fairness
 Peer Review
 Student Review (experience in taking the test,
impressions and reactions)
2. Empirically-Based Procedures
Empirically-Based Procedures
• This technical process is referred to as item
analysis.
• Item analysis is a process which examines
student responses to individual test items
(questions) in order to assess the quality of
those items and of the test as a whole.
• Item analysis is especially valuable in
improving items which will be used again in
later tests.
• It can also be used to eliminate ambiguous or
misleading items in a single test administration.
• Item analysis is valuable for
increasing instructors' skills in test
construction, and identifying
specific areas of course content
which need greater emphasis or
clarity.
• An item is considered good when its quality
indices, i.e., difficulty index and
discrimination index, meet certain
characteristics.
• An item is good if it can discriminate
between those who perform well in the test
and those who do not.
Item Analysis

• Index of Difficulty (p)


• Index of Discrimination (D)
Index of Difficulty, P
• The item difficulty is simply the
percentage of students who answer an
item correctly.
• Item difficulty is relevant for
determining whether students have
learned the concept being tested.
• .
Difficulty Index

• The item difficulty index ranges from 0 to
100% (0.0 – 1.0), which indicates from
extremely very difficult as no one got it
correctly to extremely very easy as
everyone got it correct.
• The higher the value, the easier the
question.
Index of Difficulty
(P value)
P value range Interpretation
0.00 – 0.20 Very difficult item
0.21 – 0.40 Difficult item
0.41 – 0.60 Moderately difficult item
0.61 – 0.80 Easy item
0.81 and above Very easy item

Source: Gutierrez, D. S. (2007). Assessment of Learning Outcomes (Cognitive Domain)


Index of Discrimination, D
• A measure of the extent to which a test item
discriminates or differentiates between
students who do well on the overall test and
those who do not do well on the overall test, or
between more knowledgeable and less
knowledgeable learners.
• Item discrimination refers to the ability of an
item to differentiate among students on the
basis of how well they know the material being
tested.
• The higher the discrimination index (D),
the more marked the magnitude of the
difference between the performance of
those who scored high and those who
scored low in an item, and thus, the more
discriminating the item is.
• The item will have low discrimination if it
is so difficult that almost everyone gets it
wrong or guesses, or so easy that almost
everyone gets it right.
Different Directions
• Positively discriminating item – proportion
of high scoring group is greater than that of
the low scoring group.
• Negatively discriminating item – proportion
of high scoring group is less than that of the
low scoring.
• Not discriminating – proportion of high
scoring group is equal to that of the low
scoring group.
Index of Discrimination
( D value)
D value range Interpretation
0.40 and above Very good items
0.30 to 0.39 Reasonably good items, but possibly
subject to improvement
0.20 to 0.29 Marginal items, usually needing
improvement
0.19 and below Poor items, to be rejected or improved
by revision
Test items
• Good or retained test item
• Fair or revised item
• Poor or rejected item
Good or retained test item
• A good or retained test item must have both
acceptable index of difficulty and
discrimination index.
• The acceptable index of difficulty ranges
from 0.41 to 0.60, while the acceptable
index of discrimination ranges from +0.20
to +1.00.
Fair or revised item
• Contains either unacceptable difficulty or
discrimination index.
Poor or rejected item
• Must possess both unacceptable difficulty
and discrimination indices. It has to be
discarded right away.
Type of item Features present
Good (Retained) both acceptable difficulty
and discrimination
indices
Fair (Revised) Either unacceptable
difficulty or
discrimination index
Poor (Rejected) Both unacceptable
difficulty and
discrimination indices
Item Analysis: Examples

• Difficulty index, P
• Discrimination index, D
Difficulty index (P)
• Consider the following example.

• A B C* D
3 0 18 9

• How does that help us?


Difficulty index (P)
• P = number of students selecting correct answer
total number of students attempting the item

P = 18 = 0.60
30

P = <0.25 (the item is relatively difficult)


>0.75 (the item is relatively easy)
1. What is the difficulty index of an item if 25 students are unable
to answer it correctly while 75 answered it correctly? Give your
interpretation and decision (action).
2. Compute the difficulty index for the following items. Interpret
your results. (The asterisk indicates the correct option)

a. A B C* D
3 0 18 9

b. A B* C D
10 5 8 0

c. A B C* D
4 2 16 3
1. What is the difficulty index of an item if 25 students are unable
to answer it correctly while 75 answered it correctly? Give your
interpretation and decision (action). P = 75/100 = 0.75 (easy)
2. Compute the difficulty index for the following items. Interpret
your results. (The asterisk indicates the correct option)

a. A B C* D
3 0 18 9
P = 18/30 = 0.60 (moderately easy)
b. A B* C D
10 5 8 0
P = 5/23 = 0.22 (difficult)
c. A B C* D
4 2 16 3
P = 16/25 = 0.64 (moderately easy)
U-L Index Method for Item
Analysis
• One method that can be employed for item
analysis is the Upper-Lower Index method.
• The most commonly used U-L Index
Method (Stecklein, 1957) is the Upper and
Lower 27%.
Steps
1. Score the test paper and arrange the total scores from
highest to lowest.
2. Split the test papers into halves: upper group and lower
group.
3. For a class of 50 or less, do a 50-50 split.
4. For a big group (>100): Segregate the top and bottom
25-27 % of the papers. Maintain equal numbers of test papers
for upper and lower group.
5. Obtain the p value for the Upper group and the p value for
the lower group.
6. Get the discrimination index by getting the difference
between the p values.
Item Analysis
(Difficulty Index)

• To compute the index of difficulty for an
item, follow the formula below:
Difficulty index, p = p upper group + p lower group
2
Discrimination Index, D

• Items with negative discrimination indices,
although significantly high, are subject right
away to revision if not deletion.
• With multiple-choice, negative D is a
forensic of errors in item writing:
– Wrong key (informed students selected a distracter which is the
correct answer)

– Unclear problem in the stem leading to more


than one correct answer
– Ambiguous distracters (informed students divided in
choosing the attractive option)

– Implausible keyed option (more informed students will not


choose)
Results of Item Analysis
No. students tested = 50
Item # Upper 27% Lower 27% P D Remarks Decision

No.= 14 % No.= 14 %

1 12 3
2 14 7
3 7 10
4 12 6
5 10 4
6 14 14
7 11 1
8 13 12
9 9 7
10 4 14
Results of Item Analysis
No. students tested = 50
Item # Upper 27% Lower 27% P D Remarks Decision

No.= 14 % No.= 14 %

1 12 3
2 14 7
3 7 10
4 12 6
5 10 4
6 14 14
7 11 1
8 13 12
9 9 7
10 4 14
Results of Item Analysis
No. students tested = 50
Item # Upper 27% Lower 27% P D Remarks Decision

No.= 14 % No.= 14 %

1 12 0.86 3 0.21 0.54 0.65 Good Retain


2 14 1.00 7 0.50 0.75 0.50 Good Retain
3 7 0.50 10 0.71 0.61 -0.21 Fair Revise
4 12 0.86 6 0.43 0.65 0.43 Fair Revise
5 10 0.71 4 0.29 0.50 0.42 Fair Revise
6 14 1.0 14 1.0 1.00 0 Poor Reject
7 11 0.79 1 0.07 0.43 0.72 Good Retain
8 13 0.93 12 0.86 0.90 0.07 Poor Reject
9 9 0.64 7 0.50 0.57 0.14 Fair Revise
10 4 0.29 14 1.0 0.65 -0.71 Fair Revise
• Item 3 is a fair item: p is acceptable; D is
unacceptable.
– Negative D indicates that those who did poorly
on the entire test (lower group) chose the
correct answer for a particular item more than
those who did well on the same test.
– Options in MC are not working well and are not
plausible for the good performers.
• Item 9 is also a fair question because p is
acceptable (0.57) but D is not acceptable
(0.14).
– Positive D suggests that those who did well
(upper group) selected the correct answer for a
certain item more than those who did poorly on
the same test.
– Since it has unacceptable but positive D, some
of the options are not plausible enough for the
good performers. Some of the options need to
be revised.

Vous aimerez peut-être aussi