Item Analysis

1.
0 INTRODUCTION
For this project, our group is required to build or design one complete objective
test paper which consists of 40 items or multiple choice questions. We have decided to
build an English test paper with 40 items and each item contains 4 options. Our test paper
also consists of 5 sections and it must be answered by the students within 1 hour.
This multiple choice test paper is designed to fulfill several objectives as
mentioned below:
1) It is a part of the course’s assessment (DPP 406 – Pentaksiran Pembelajaran).
2) It can be used to see and evaluate the students’ behaviour.
3) It can measure the students’ performance and differentiate between the top (high
scorers) and bottom performers (low scorers) on the test as a whole for the chosen
class.
4) It can measure the students’ factual and procedural knowledge.
5) It is useful for measuring whether the students can analyze material in the items
correctly or not.
6) It can be used to check whether the students have learned facts and routine
procedures which have only one clear correct answer.
7) It encourages students to think critically before they choose the best options for
the particular question.
8) It enables us (teachers) to analyze each item that we have built, in which the key
aim in this item analysis is to increase the reliability and validity of the test. By
1
analyzing the items, we will be able to identify the weak items and try to come
out with solutions to improve the particular item.
Our group has chosen to distribute the test papers to one Form 2 class in Sekolah
Menengah Kebangsaan (SMK) Bukit Gambir, Pulau Pinang. We went to the school twice,
which is on 9th June 2008 (Monday) to ask permission from the school’s principal and
discuss with the English Language Coordinator regarding the suitable time to distribute
the test papers. On 13th June 2008 (Friday), we went to the school again to distribute the
test papers to the Form 1 students with the advanced level.
The students begin to answer the test paper on 8.45 a.m. until 9.45 a.m. After all
the students have answered all the questions within the stipulated time, our group has
taken the opportunity to ask for the students’ feedback regarding the question paper that
we have distributed to them. Overall, the students felt that the items in the question paper
are at the average or medium level, whereby there are some difficult questions which
create confusion to the students and some easy questions which can be answered by the
students without any problem. Generally, our group felt very happy and satisfied with the
school and students’ cooperation during the distribution of the test papers.
Besides that, our group has also prepared the project’s report which contains 4
main topics such as the test specification table, the students’ performance (frequency,
mean, mode, median, variance and standard deviation), item analysis and suggestions or
improvements on weak items.
2
2.0 TEST SPECIFICATION TABLE
Subject: English Language (B.I)

Class : Form 2
Date : 13th June 2008 (Friday)
Time : 8.45a.m - 9.45a.m (1 Hour)
BLOOM’S TAXONOMY (LEARNING OUTCOMES)

ITEM NUMBER BEHAVIORAL OBJECTIVES
(NO.) Knowledge Comprehension Application Analysis Synthesis Evaluation
Section A
(Graphic
Materials and
Stimuli ) –
Students would
need to study
information found
in the graphic
materials and
short texts and
answer the
questions based on
them.
• Item No.1 1) Students will be able to

identify the correct
meaning of the sign
√
given.
3
identify the most accurate √
information of the label as
displayed in the given
picture.

identify the purpose of the √
letter correctly.

differentiate the √
characteristics between
the 2 types of bats in
Malaysia as described in
the given text.

identify the correct √
synonym for the word
“range” as described in
the given text.
4
objective of the workshop
as shown in the poster
given.

identify the most likely √
food to be served for the
participants based on the
poster given.

identify the correct step to √
be taken before the bowl
is covered as described in
the label given.

identify the correct step √
that people should take
when there is a tsunami
warning by referring to
the news report given.
5
identify the correct type √
of menu as shown in the
picture given.
Section B
(Rational
Cloze) – This
section tests
the students’
knowledge of
grammar and
vocabulary.
Students need
to learn how
they can apply
and use the
clues in the
text to get the
correct answer.
• Item No.11
11) Students will be able to
choose the most accurate √
interrogative pronoun to
construct a grammatically
correct sentence.
6
correct sentence.

correct sentence.

correct sentence.

correct sentence.
7
correct sentence.

correct sentence.

correct sentence.
Section C
(Closest In
Meaning) – In
this section,
students will
learn how to
use available
clues to answer
8
the questions
on similar
expressions.

choose the best meaning
for the underlined phrases
√
as provided in the
conversation given.

choose the best meaning √
as provided in the
conversation given.

as provided in the
conversation given.

as provided in the
conversation given.
9
as provided in the
conversation given.
Section D
(Reading
Comprehension)
– This section
provides all
kinds of
comprehension
passage and
students should
know how to
read and
understand
different types of
comprehension
passages.
• Item No.24 24) Students will be able to √

choose the correct third
step in making a kite by
looking at the information
in the given descriptive
passage.
10
identify the correct type √
of bamboo sticks to make
the kite by looking at the
information in the given
descriptive passage.

identify the correct tool to √
make the 2 sticks into a
frame based on the
descriptive passage.

synonym for the word
“apply” as described in
the given descriptive
passage.

choose the correct answer √
to fill in the blank in the
sentence by referring to
the information in the
given descriptive passage.
11
choose the correct √
purpose of the letter by
looking at the information
in the given letter.

distinguish the correct √
answer from the false
answers based on the
letter.

identify the correct reason √
for Connelia Eleanor to
invite Emelda Allyn to
her school according to
given letter.

choose the closest √
meaning for the phrase
“to request a favour”
correctly by looking at the
given letter.
12
choose the best √
description about Emelda
Allyn based on the
letter.

predict the main objective √
of the Drama Society.
Section E
(Literature
Component) –
In this section,
students need
to have better
understanding
of the
literature texts.

purpose for the writer to
go to Innisfree based on
the given poem.
13
choose the best answer to √
fill in the blank in the
sentence by referring to
given poem.

identify the correct way to √
for the writer to earn
living in Innisfree.

for the phrase “gone back
to our ancestors” as
shown in the extract of
the given short story.

character who should be
the chief of the village
based on custom by
referring to the extract of
the given short story.
14
identify the important √
moral value that the
people of Dalat had learnt
from the incident
happened in their village
correctly by looking at the
extract of the given short
story.
TOTAL 40 Items / Behavioral Objectives 5 24 9 2 - -
Percentage 40 % 5% 24 % 9% 2% - -
15
3.0 THE OVERALL REPORT OF THE STUDENTS’ PERFORMANCE
THROUGH FREQUENCY, MEAN, MODE, VARIANCE AND
STANDARD DEVIATION
Overall, the performance of the form two students was very good. This is because none of
the student fails in this exam. Most of them managed to get a good marks range from 30 or 70%
and above. This can be shown in the score table, Table 3.2, and the frequency table, Table 3.3,
below whereby only one of the student got 22 marks out of 40 marks or 55%, 3 students got 27
marks out of 40 marks which also amount to 67.5%, 5 students got 29 marks over 40 marks that
is 72.5%, 13 students got the marks range from 30 to 34 marks over 40 marks and 8 students got
the marks range from 35 to 39 over 40 marks. In the other words, from the results shown, they
have clearly shown that the exam questions can be considered as very easy and the level of
intelligence of the students were excellent. This can be supported by the histogram shown in
Histogram 1.
STUDENTS SCORE (x) x-m (x-m)2

S30 39 6.8 46.24
S11 38 5.8 33.64
S4 36 3.8 14.44
S16 36 3.8 14.44
S17 36 3.8 14.44
S22 36 3.8 14.44
S26 36 3.8 14.44
S9 35 2.8 7.84
S1 34 1.8 3.24
S7 34 1.8 3.24
S19 34 1.8 3.24
S21 34 1.8 3.24
S24 34 1.8 3.24
S28 34 1.8 3.24
S12 33 0.8 0.64
S18 33 0.8 0.64
S13 32 -0.2 0.04
16
S14 32 -0.2 0.04
S5 31 -1.2 1.44
S20 31 -1.2 1.44
S25 30 -2.2 4.84
S3 29 -3.2 10.24
S6 29 -3.2 10.24
S10 29 -3.2 10.24
S15 29 -3.2 10.24
S27 29 -3.2 10.24
S2 27 -5.2 27.04
S23 27 -5.2 27.04
S29 27 -5.2 27.04
S8 22 -10.2 104.04
Mean (m) Mode Variance (s2) Standard Deviation (SD)

32.20 34 14.16 3.763
TABLE 3.1: Mean, mode, variance and standard deviation.
Meanwhile, from the table above we can see that the mean or the average marks obtained
by 30 students of this Form Two class is equal to 32.20 which also amount to 80.5%. Whereas,
the mode which also refers to the score that occurs most frequently for this class is 34. In the
other words, most of the students had averagely scored 85% for this test. The variance or the
average of the squared differences of the students’ scores from the mean is equal to 14.16 and the
standard deviation or the average of the differences of all scores from the mean is equal to 3.763.
17
Scores (x/40) Percentage (%) Number of Students
40 100% 0
39 97.5% 1
38 95% 1
37 92.5% 0
36 90% 5
35 87.5% 1
34 85% 6
33 82.5% 2
32 80% 2
31 77.5% 2
30 75% 1
29 72.5% 5
28 70% 0
27 67.5% 3
26 65% 0
25 62.5% 0
24 60% 0
23 57.5% 0
22 55% 1
21 52.5% 0
20 50% 0
TABLE 3.2: Students Performances’ Table
Histogram 1: Skewed Distribution
18
Frequenc
Bin y
22 1
25.4 0
28.8 3
32.2 10
35.6 9
More 7
TABLE 3.3: Frequency
From the histogram shown above, it shows that the distributions of the students’ results
are negatively skewed. Meaning to say, the exam questions are quiet easy for them therefore
most of the students managed to score a very good grade for this test. This may be caused by the
level of intelligence of the students themselves, in which they are from the advance class. Thus,
their mastery of English Language is very good. Apart from that, the designed items in this exam
paper can be considered as unable to measure, to evaluate or to discriminate the real performance
of each student in the classroom as they still cannot differentiate between the excellent students
and the poor students.
19
4.0 ITEM ANALYSIS
Item analysis is carried out to aid in evaluating the effectiveness of a test item. In conducting the
item analysis of this test, item difficulty, item discrimination and distracters quality will be
considered. The students are categorised into upper-group and lower-group. The scores of the
students were sorted descending 27% of the high-scoring students are categorised into upper-
group and 30% of the low-scoring students are categorised into lower-group students.
STUDENTS SCORE (x)

S30 39
S11 38
S4 36
S16 36
S17 36
S22 36
S26 36
S9 35
S1 34
S7 34
S19 34
S21 34
S24 34
S28 34
S12 33
S18 33
S13 32
S14 32
S5 31
20
S20 31
S25 30
S3 29
S6 29
S10 29
S15 29
S27 29
S2 27 - Upper-group students
S23 27 - Lower-group students
S29 27
S8 22
4.1 Item Difficulty
The index of difficulty (p value) is the proportion of the total group who got an item right. It
ranges between 0 – 1. A p-value closer to 1 indicates the easier the item is as more students can
answer the item correctly. On the other hand, the closer the value to 0, the more difficult the item
is as fewer students got the answer right. The index of difficulty (p) of each item of the test was
calculated using the following formula:
where,
P = index of difficulty
Ncorrect = number of students answering correctly
Ntotal = number of students taking the test
Upper Group Lower Group Middle

(8 Students) (9 Students) Group Discrimination
Item Pall Pupper Plower
(13 Index
A B C D A B C D
Students)
1 1 1 *5 1 1 5 *1 2 8 0.467 0.625 0.111 0.514
2 0 *8 0 0 0 *8 0 1 13 0.967 1.000 0.889 0.111
3 0 *8 0 0 0 *8 0 1 12 0.933 1.000 0.889 0.111
4 *6 1 0 1 *7 2 0 0 10 0.767 0.750 0.778 -0.028
5 *7 1 0 0 *5 4 0 0 4 0.533 0.875 0.556 0.319
6 0 1 2 *5 2 3 1 *3 12 0.667 0.625 0.333 0.292
21
7 0 0 0 *8 0 0 6 *3 9 0.667 1.000 0.333 0.667
8 0 *8 0 0 1 *7 1 0 12 0.900 1.000 0.778 0.222
9 0 0 *8 0 1 1 *7 0 12 0.900 1.000 0.778 0.222
10 0 0 0 *8 1 1 0 *7 13 0.933 1.000 0.778 0.222
11 0 0 0 *8 0 0 0 *9 13 0.967 0.875 1.000 -0.125
12 *6 2 0 0 *6 2 0 1 7 0.633 0.750 0.667 0.083
13 *7 0 0 1 *0 1 5 3 9 0.533 0.875 0.000 0.875
14 0 *8 0 0 1 *7 1 0 11 0.867 1.000 0.778 0.222
15 0 1 0 *7 2 1 3 *3 5 0.500 0.875 0.333 0.542
16 0 1 0 *7 3 0 1 *5 10 0.733 0.875 0.556 0.319
17 *7 0 1 0 *4 1 4 0 7 0.600 0.875 0.444 0.431
18 0 0 *8 0 2 0 *7 0 10 0.833 1.000 0.778 0.222
19 0 0 *8 0 2 1 *6 0 11 0.833 1.000 0.667 0.333
20 0 *8 0 0 1 *8 0 0 13 0.967 1.000 0.889 0.111
21 0 *7 0 1 0 *7 0 2 13 0.900 0.875 0.778 0.097
22 0 0 *8 0 0 0 *9 0 13 1.000 1.000 1.000 0.000
23 0 0 0 *8 0 0 0 *9 13 1.000 1.000 1.000 0.000
24 *7 0 0 1 *9 0 0 0 11 0.867 0.750 1.000 -0.250
25 *8 0 0 0 *5 1 0 3 10 0.767 1.000 0.556 0.444
26 0 *8 0 0 1 *6 2 0 13 0.900 1.000 0.667 0.333
27 0 0 *8 0 0 2 *7 0 13 0.933 1.000 0.778 0.222
28 *8 0 0 0 *8 0 0 1 11 0.900 1.000 0.889 0.111
29 *8 0 0 0 *7 0 2 0 12 0.900 1.000 0.778 0.222
30 *5 0 3 0 *5 0 3 1 6 0.533 0.625 0.556 0.069
31 2 0 0 *6 4 2 0 *3 8 0.567 0.750 0.333 0.417
32 *8 0 0 0 *4 1 3 1 11 0.767 1.000 0.444 0.556
33 *8 0 0 0 *4 2 1 2 9 0.700 1.000 0.444 0.556
34 0 *8 0 0 0 *9 0 0 9 0.867 1.000 1.000 0.000
35 0 0 *8 0 0 0 *9 0 13 1.000 1.000 1.000 0.000
36 *8 0 0 0 *8 1 0 0 13 0.967 1.000 0.889 0.111
37 0 *8 0 0 2 *7 0 1 11 0.833 1.000 0.667 0.333
38 0 0 *8 0 1 1 *7 0 13 0.967 1.000 0.889 0.111
39 0 0 *8 0 0 0 *9 0 13 1.000 1.000 1.000 0.000
40 *4 0 0 4 *4 0 1 4 10 0.633 0.500 0.556 -0.056
* Denotes Correct Answer
TABLE 4.1: Calculation of P-values And Discrimination Index for Each Item
From the table above, the p-value for each item is listed in the Pall column. The p-values
calculated range from 0.467 to 1.000. Since this test is a norm-referenced test (NRT), therefore
22
the average difficulty index should be within 0.34 to 0.66. The difficulty indices were analysed
using the Henning’s (1987) guidelines as shown in the following table:
High Medium Low

(Difficult) (Moderate) (Easy)
≤ 0.33 0.34 – 0.66 ≥ 0.67
TABLE 4.2: Henning’s Guidelines
Item with p-value lesser than 0.33 will be considered as a difficult item. Whereas item with p-
value between 0.34 to 0.66 is a moderately difficult item and item with p-value more than 0.67 is
an easy item.
Level of difficulty Items Count

High (≤ 0.33) - 0
Medium (0.34 – 0.66) 1,5,12,13,15,17,30,31,40 9
2,3,4,6,7,8,9,10,11,14,16,18,19,20,21,22,23,
Low (≥ 0.67) 31
24,25,26,27,28,29,32,33,34,35,36,37,38,39
TABLE 4.3: Item Categorisation Based On the Level of Difficulty
Based on the table above, most items are categorised into low difficulty item which means this
test is an easy test. There seems to be an imbalance distribution among easy and difficult items
where 78% of the items have low level of difficulty whereas none of the item in the test has high
level of difficulty. However, the discrimination index should also be considered in analysing
these test items.
4.2 Item Discrimination
The index of discrimination (D) is the difference between the proportion of the upper group who
answered an item correctly and the proportion of the lower group who answered the item
correctly. This index is dependent upon the difficulty of an item which means it relates with the
index of difficulty of the item. The D of each item for this test is calculated using the formula
below:
D = Pupper – Plower
23
where,
D = item discrimination for an individual item
Pupper = item difficulty for the upper group on the whole test
Plower = item difficulty for the lower group on the whole test
Ebel’s (1979) criteria and guidelines for categorising discrimination indices is a widely quoted
set of guidelines and therefore is used in this test analysis to categorise the 40 test items.
Discrimination index
Description
(D)
≤ 0.19 (Bad) The item should be eliminated or completely revised.
0.20 – 0.29 (OK) The item is marginal and needs revision.
0.30 – 0.39 (Good) Little or no revision is required.
≥ 0.40 (Very Good) The item is functioning quite satisfactorily.
TABLE 4.4: Ebel’s Guidelines
Based on the Ebel’s guidelines in the above table, the 40 test items can be categorised as follows:
Discrimination Index
Items Count
(D)
≤ 0.19 (Bad) 2,3,4,11,12,20,21,22,23,24,28,30,34,35,36,38,39,40 18
0.20 – 0.29 (OK) 6,8,9,10,14,18,27,29 8
0.30 – 0.39 (Good) 5,16,19,26,37 5
≥ 0.40 (Very Good) 1,7,13,15,17,25,31,32,33 9
TABLE 4.5: Item Categorisation Based on Discrimination Index
The results indicate that about 65% of the test items are week in discriminating between the
‘good’ and ‘weak’ students and these items need to be looked closely as they may need
considerable revision or elimination.
Item Difficulty
Item Medium (0.34 –
Discrimination High (≤ 0.33) Low (≥ 0.67)
0.66)
≤ 0.19 (Bad) - 12,30,40 2,3,4,11,20,21,22,23,24,28,
24
34,35,36,38,39
0.20 – 0.29 (OK) - - 6,8,9,10,14,18,27,29
0.30 – 0.39 (Good) - 5 16,19,26,37
≥ 0.40 (Very Good) - 1,13,15,17,31 7,25,32,33
TABLE 4.6: Item Categorisation Based On The Relationship Item Discrimination and Item
Difficulty
The table above shows the relationship between the item discrimination and item difficulty for
each item. Items which are listed in the black coloured cell need to be revised or eliminated if the
items are intended to be kept for future use.
4.3 Distracter Analysis
Distracter analysis examines the proportion of students who selected each of the response
options. According to Tucker (2007), “on a well-designed multiple choice item, high scoring
students should select the correct option even from highly plausible distracters. Those who are
ill-prepared should select randomly from available distracters. In this scenario, the item would be
a good discriminator of knowledge and should be considered for future assessments. In other
scenarios, a distracter analysis may reveal an item that was mis-keyed, contained a proofreading
error, or contains a distracter that appears plausible even by those that scored well on an
assessment”. The proportion for each option is calculated by using the following formula:
where,
Prop = Proportion student choosing the distracter
n = number of students choosing the distracter
N = total number of students taking the test
The table below shows the proportion of students in the upper and lower group who selected the
correct answers as well as the proportion of students choosing each alternative for each item.
25
Upper Group (8 Students) Lower Group (9 Students) Discrim
Item Pall -ination
Index
A B C D A B C D
*0.62 0.46
1 0.125 0.125 0.125 0.111 0.556 *0.111 0.222 0.514
5 7
26
*1.00 *0.88 0.96
2 0.000 0.000 0.000 0.000 0.000 0.111 0.111
0 9 7
*1.00 *0.88 0.93
3 0.000 0.000 0.000 0.000 0.000 0.111 0.111
0 9 3
*0.75 *0.77 0.76
4 0.125 0.000 0.125 0.222 0.000 0.000 -0.028
0 8 7
*0.87 *0.55 0.53
5 0.125 0.000 0.000 0.444 0.000 0.000 0.319
5 6 3
*0.62 *0.33 0.66
6 0.000 0.125 0.250 0.222 0.333 0.111 0.292
5 3 7
*1.00 *0.33 0.66
7 0.000 0.000 0.000 0.000 0.000 0.667 0.667
0 3 7
*1.00 *0.77 0.90
8 0.000 0.000 0.000 0.111 0.111 0.000 0.222
0 8 0
*1.00 *0.77 0.90
9 0.000 0.000 0.000 0.111 0.111 0.000 0.222
0 8 0
*1.00 *0.77 0.93
10 0.000 0.000 0.000 0.111 0.111 0.000 0.222
0 8 3
*1.00 *1.00 0.96
11 0.000 0.000 0.000 0.000 0.000 0.000 -0.125
0 0 7
*0.75 *0.66 0.63
12 0.250 0.000 0.000 0.222 0.000 0.111 0.083
0 7 3
*0.87 *0.00 0.53
13 0.000 0.000 0.125 0.111 0.556 0.333 0.875
5 0 3
*1.00 *0.77 0.86
14 0.000 0.000 0.000 0.111 0.111 0.000 0.222
0 8 7
*0.87 *0.33 0.50
15 0.000 0.125 0.000 0.222 0.111 0.333 0.542
5 3 0
*0.87 *0.55 0.73
16 0.000 0.125 0.000 0.333 0.000 0.111 0.319
5 6 3
*0.87 *0.44 0.60
17 0.000 0.125 0.000 0.111 0.444 0.000 0.431
5 4 0
*1.00 *0.77 0.83
18 0.000 0.000 0.000 0.222 0.000 0.000 0.222
0 8 3
*1.00 *0.66 0.83
19 0.000 0.000 0.000 0.222 0.111 0.000 0.333
0 7 3
*1.00 *0.88 0.96
20 0.000 0.000 0.000 0.111 0.000 0.000 0.111
0 9 7
*0.87 *0.77 0.90
21 0.000 0.000 0.125 0.000 0.000 0.222 0.097
5 8 0
*1.00 *1.00 1.00
22 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0 0 0
*1.00 *1.00 1.00
23 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0 0 0
*0.87 *1.00 0.86
24 0.000 0.000 0.125 0.000 0.000 0.000 -0.250
5 0 7
27
*1.00 *0.55 0.76
25 0.000 0.000 0.000 0.111 0.000 0.333 0.444
0 6 7
*1.00 *0.66 0.90
26 0.000 0.000 0.000 0.111 0.222 0.000 0.333
0 7 0
*1.00 *0.77 0.93
27 0.000 0.000 0.000 0.000 0.222 0.000 0.222
0 8 3
*1.00 *0.88 0.90
28 0.000 0.000 0.000 0.000 0.000 0.111 0.111
0 9 0
*1.00 *0.77 0.90
29 0.000 0.000 0.000 0.000 0.222 0.000 0.222
0 8 0
*0.62 *0.55 0.53
30 0.000 0.375 0.000 0.000 0.333 0.111 0.069
5 6 3
*0.75 *0.33 0.56
31 0.250 0.000 0.000 0.444 0.222 0.000 0.417
0 3 7
*1.00 *0.44 0.76
32 0.000 0.000 0.000 0.111 0.333 0.111 0.556
0 4 7
*1.00 *0.44 0.70
33 0.000 0.000 0.000 0.222 0.111 0.222 0.556
0 4 0
*1.00 *1.00 0.86
34 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0 0 7
*1.00 *1.00 1.00
35 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0 0 0
*1.00 *0.88 0.96
36 0.000 0.000 0.000 0.111 0.000 0.000 0.111
0 9 7
*1.00 *0.77 0.83
37 0.000 0.000 0.000 0.222 0.000 0.111 0.333
0 8 3
*1.00 *0.77 0.96
38 0.000 0.000 0.000 0.111 0.111 0.000 0.111
0 8 7
*1.00 *1.00 1.00
39 0.000 0.000 0.000 0.000 0.000 0.000 0.000
0 0 0
*0.50 *0.44 0.63
40 0.000 0.000 0.500 0.000 0.111 0.444 -0.056
0 4 3
* Denotes Correct Answer
TABLE 4.7: The Proportion of Students in the Upper and Lower Group Who Selected
Each Option
4.4 Kuder Richardson 20
This statistic measures test reliability of inter-item consistency. A KR-20 value ranges from 0 – 1.
A higher value indicates a strong relationship between items on the test. While a lower value
28
indicates a weak relationship between test items. Therefore, a test has a better reliability when
the KR-20 value closer to 1.
The KR-20 is calculated as follows:
where,
KR-20 - Kuder Richardson 20
k - Number of items in the test
p - Item difficulty
2
S - Variance of the raw scores or standard deviation squared
The value of KR-20 calculated for this test is 0.63. It seems that this test is moderately reliable.
5.0 COMMENT AND SUGGESTION
In this chapter, we will be analyzing the item in order to discover items that consist of ambiguous
item, miskeyed, too easy or too difficult item and non-discriminating item.
The reasons for this analyzing are to enhance the technical quality of an examination by pointing
out options that are nonfunctional and that should be improved or eliminated.
5.1 INTERPRETING ITEM ANALYSIS DATA
5.1.1 Too easy items
29
Item that virtually everyone gets right are useless for discriminating among students and should
be replaced by more difficult items or eliminated. This could be seen from the proportion of
students answering an item correctly. This point may be summarized by saying that items
answered correctly by a large proportion of examinees have markedly reduced power to
discriminate.
When all items are extremely easy, most test scores will be extremely high. In either case, test
scores will show very little variability. Thus, extreme p values directly restrict the variability of
test scores. When everyone taking the test chooses the correct response as is seen in Table 1. An
item with a p value of .0 or a p value of 1.0 does not contribute to measuring individual
differences, and this is almost certain to be useless.
Example
Question 22:-
Mei Ling : Can I check out (21) this book?

Librarian : Of course, you can. Is that all?
Mei Ling : Yes, that’s all. When must I return it?
Librarian : The expiry date (22), is on the first page.
Mei Ling : I can see it. Thank you.
Librarian : Make sure when you bring back the book, it is as good as new. (23)
22. expiry date

A. date to go
B. time to leave
C. date to return
D. date of birth
30
Table 1
Maximum Item Difficulty Example Illustrating No Individual Differences
Group Item Response
Options A B C D
Upper 0 0 8 0
group
Lower 0 0 9 0
group
Note. * denotes correct response
Item difficulty: (8+9)/17 = 1.00p
Discrimination Index: (8-9)/17 = 0.05
23. it is as good as new.
A. it is still new
B. it is interesting
C. it is not dirty
D. it is in a good condition
Table 2
Minimum Item Difficulty Example Illustrating No Individual Differences
Group Item Response
Options A B C D
Upper 0 0 0 8
group
31
Lower 0 0 0 9
group
Item difficulty: (8+9)/17 = 1.00p
Discrimination Index: (8-9)/17 = 0.05
Based on table one (1) and table two (2,) we could suggest that the items must be improved by
making the distracter more attractive and these items should be replace or eliminate in order to
construct an item which is suitable with students’ level of knowledge.
5.1.2. The ambiguity of the options
One measure of item ambiguity is the extent to which students in the upper group select an
incorrect option with about the same frequency as they select the correct one. Ambiguity defined
in this way is the inability of the highest-scoring students on the test to discriminate between a
“correct” alternative and one judged by the teacher to be “wrong”. An ambiguous item could also
be defined as one that allows for more than one “correct” alternative as judged by a group of
experts, although a question that is clear to experts may be ambiguous to students who lack
understanding of the item’s content.
Example:- Item 40.
40. What was the important moral value that the people of Dalat had learned from the incident
happened in their village?
A. to value peace
B. never listen to brothers
C. customs are waste of time they should obey their siblings
D. they should obey their siblings
32
Table 3
Item Difficulty
Group Item Response
Options A B C D
Upper 4 0 0 4
group
Lower 4 0 1 4
group
Table 3, about equal numbers of top students went for A and D
In the example, the item appears to be ambiguous because the two options can be justified.
When students in the upper portion of the class select a “correct” option and an “incorrect”
option with about equal frequency, the item is ambiguous either because students lack knowledge
or because the options or the item was defective. Which of these reasons is applicable to a given
item is determined by examining the highly selected but “incorrect” options to see if more than
one answer can be justified. Our suggestion to this item is to look at their favorite alternative
again, and see if we can find any reason they could be choosing it.
5.1.3. Miskeying
33
When we first correcting students’ test paper, we have identified a problem where students have
giving the correct answer but miskeyed by us. Miskeying is another common error that can be
corrected before students’ papers are returned. One way of detecting potentially miskeyed items
is to examine the responses of the students in the upper portion of the class. An “incorrect”
option selected by a large number of these students suggests a keying error, as in the following
example:
28. A string must be tied to each __________ of the stick to make frame.
*A. end (8)

B. left (0)
C. right (0)
D. bottom (0)
Because the majority of the most capable students selected ‘end’ as the correct alternative and so
few agreed with the “keyed” answer, the teacher should check the key for a possible error.
5.1.4. The effectiveness of distracters
Analyzing the distracters is useful in determining the relative usefulness of the decoys in each
item.
Example: Item 5.
5. The word range means
A. vary
B. come
C. fly
D. watch
34
Table 4
Alternatives aren’t working
Group Item Response
Options A B C D
Upper group 7 1 0 0
Lower group 5 4 0 0
Based on the example, we could clearly identified that no one fell for options C and D. Option B
is a good distracter however options C and D is not a plausible alternatives. Our suggestion is, if
a distracter elicits very few or no responses, then it may not be functioning as a distracter and
should be replaced with a more attractive and realistic option.
Example: Item 6
6. The objective of the workshop is to
A. visit parks
B. hold a talk
C. promoting membership
D. enlighten students
Table 5
Effective distracters
Group Item Response
35
Options A B C D
Upper 0 1 2 5
group
Lower 2 3 1 3
group
Item difficulty = 0.667
Discrimination Index = 0.292
Distracters should be carefully examined when items show large positive p-values. Our comment
is, items should be modified if students consistently fail to select certain multiple choice
alternatives. The alternatives are probably totally implausible and therefore of little use as decoys
in multiple choice items. Also some distracters may be too appealing, causing the item to be too
difficulty, discriminateable, or variability can be redeemed by the revision of one or two of the
response options.
5.1.5. The presence of guessing
Based on the item analysis, we could clearly identify that the students have answer the question
without guessing or respond to the questions randomly. This is mainly because of; the test was
given to an advance class student and due to easy items given.
6.0 CONCLUSIONS
Overall, we could say that this test is moderately reliable. However, most of the items are too
easy to be given to advance level students. As a result, most of the items were not functioning
very well and could not perfectly differ between upper and lower students. Nevertheless, the item
36
analysis statistics of item discrimination (IF) and item difficulty (ID) help us to decide which
items to keep and which to discard in creating a new revised version of the test. Moreover, the
distracter efficiency analysis is also useful for spotting items that are miskeyed and for tuning up
those items that have options that are not working as would be expected.
7.0 REFERENCES
Ebel, R.L. (1979). Essentials of educational measurement (3rd ed.). Englewood Cliffs, NJ:
Prentice Hall.
Henning, G. (1987) A Guide To Language Testing Development, Evaluation, Research. London:
Newbury House Publisher.
Tucker, S. (2007, September 21). Retrieved June 26, 2008, from University of Maryland:
http://www.umaryland.edu/cits/testscoring/pdf/sop_deconstructingtestscoring.pdf
37
8.0 APPENDIX
38

Item Analysis

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Item Analysis

Transféré par

Droits d'auteur :

Formats disponibles

1.

This multiple choice test paper is designed to fulfill several objectives as

1) It is a part of the course’s assessment (DPP 406 – Pentaksiran Pembelajaran).

2) It can be used to see and evaluate the students’ behaviour.

4) It can measure the students’ factual and procedural knowledge.

procedures which have only one clear correct answer.

the particular question.

out with solutions to improve the particular item.

test papers to the Form 1 students with the advanced level.

improvements on weak items.

Subject: English Language (B.I)

BLOOM’S TAXONOMY (LEARNING OUTCOMES)

• Item No.1 1) Students will be able to

• Item No.3 3) Students will be able to

• Item No.4 4) Students will be able to

• Item No.5 5) Students will be able to

• Item No.7 7) Students will be able to

• Item No.8 8) Students will be able to

• Item No.9 9) Students will be able to

• Item No.13 13) Students will be able to

• Item No.14 14) Students will be able to

• Item No.15 15) Students will be able to

• Item No.17 17) Students will be able to

• Item No.18 18) Students will be able to

• Item No.19 19) Students will be able to

• Item No.20 20) Students will be able to

• Item No.21 21) Students will be able to

• Item No.22 22) Students will be able to

• Item No.24 24) Students will be able to √

• Item No.26 26) Students will be able to

• Item No.27 27) Students will be able to

• Item No.28 28) Students will be able to

• Item No.30 30) Students will be able to

• Item No.31 31) Students will be able to

• Item No.32 32) Students will be able to

• Item No.34 34) Students will be able to

• Item No.35 35) Students will be able to

• Item No.37 37) Students will be able to

• Item No.38 38) Students will be able to

• Item No.39 39) Students will be able to

TOTAL 40 Items / Behavioral Objectives 5 24 9 2 - -

THROUGH FREQUENCY, MEAN, MODE, VARIANCE AND

STUDENTS SCORE (x) x-m (x-m)2

Mean (m) Mode Variance (s2) Standard Deviation (SD)

TABLE 3.1: Mean, mode, variance and standard deviation.

Histogram 1: Skewed Distribution

and the poor students.

STUDENTS SCORE (x)

4.1 Item Difficulty

calculated using the following formula:

Upper Group Lower Group Middle

using the Henning’s (1987) guidelines as shown in the following table:

High Medium Low

Level of difficulty Items Count

these test items.

4.2 Item Discrimination

considerable revision or elimination.

items are intended to be kept for future use.

4.3 Distracter Analysis

4.4 Kuder Richardson 20

the KR-20 value closer to 1.

The KR-20 is calculated as follows:

5.0 COMMENT AND SUGGESTION