Vous êtes sur la page 1sur 8

Page |1

Item analysis is a process which examines student responses to individual test items (questions) in order
to assess the quality of those items and of the test as a whole. Item analysis is especially valuable in
improving items which will be used again in later tests, but it can also be used to eliminate ambiguous or
misleading items in a single test administration. In addition, item analysis is valuable for increasing
instructors’ skills in test construction, and identifying specific areas of course content which need greater
emphasis or clarity.

 Item:
 A single task or question that usually cannot be broken down into any smaller unit.
 An arithmetical problem may be an item, a manipulative task may be an item, and a mechanical
puzzle may be an item.
 Analysis:
Analysis is the process of breaking a complex topic or substance into smaller parts in order to gain a
better understanding of it.
 Item analysis:
 A type of analysis used to assess whether items on a scale are tapping the same construct and
are sufficiently discriminating.(Pollit & Beck)
 The procedure used judge the quality of an item. (KP Neeraja)
 It is a statistical technique which is used for selecting and rejecting the items of the test on the
basis of their difficulty value and discriminated power.

 An item should be phrased in such a manner that there is no ambiguity regarding its meaning
for both the item writers as well as the examinees who take the test.
 The item should not be too easy or too difficult.
 It should have discriminating power that is; it must clearly distinguish between those who
possess a trait and those who do not.
 It should not be concerned with trivial aspects of the subject –matter that is it must measure
only the significant aspects of knowledge or understanding.
 As far as possible, it should not encourage guesswork by the subject.


 To prepare the final draft properly ( easy to difficult items)
 To provide modification to be made in some of the items
 To provide discriminatory power (D.P) to differentiate between capable and less capable
examinees for the items.
 To obtain the information about the difficulty value (D.V) of all the items.
 To select appropriate items for the final draft.


 Interpretation of student’s performance
 Identify weakness
 Select best questions
 Evaluate the students.
Page |2

 Control the quality of a test.

 To become better competent teachers.
 Understand behavior of item
 Reveals the facility value of each of the item and the discrimination.
 Find out performance of group.
 Point out problems of validity test by revising and eliminating ineffective items.


There are three common types of item analysis, which provide teachers with three different types of

 Item Difficulty :
“The difficulty value of an item is defined as the proportion or percentage of the examinees who have
answered the item correctly” - J.P. Guilford

Teachers often wish to know how "hard" a test question or performance task was for their students. To
help answer that question, they can produce a difficulty index for a test item by calculating the proportion of
students in class who got that particular item correct. The larger the proportion, the more students there are
who have learned the content measured by the item. Although we call this proportion a difficulty index, the
name is counterintuitive. This is, one actually gets a measure of how easy the item is, not the difficulty of the
item. Thus, a big number means easy, not difficult.

 Item discrimination (Item validity):

“Index of discrimination is that ability of an item on the basis of which the discrimination is made
between superiors and inferiors” - Blood and Budd (1972)

Another concern to teachers related to testing involves the fundamental Validity of a given test; that is,
whether a single test item measures the same thing or assesses the same objectives as the rest of the test. A
number can be calculated that provides that information in a fairly straightforward way. The Discrimination
index is a rough indication of the Validity of an item. As such, it is a measure of an item’s ability to
discriminate between those who scored high on the total test and those who scored low. Once computed, this
index may be interpreted as an indication of the extent to which overall knowledge of the content area or
mastery of the skills is related to the response on an item. Perhaps the most crucial validity standard for a test
item is whether or not a student’s correct answer is due to his or her level of knowledge or ability and not
due to something else such as chance or test bias. An item that can discriminate between students with high
knowledge or ability and those with low knowledge or ability (as measured by the whole test) should be
considered an item that "works." Discrimination, in this case, is a good thing.

Types of Discrimination Index (D.I)

 Zero discrimination or No discrimination
 Positive discrimination
 Negative discrimination

 Zero discrimination or No discrimination:

 The item of the test is answered correctly or knows the answer by all the examinees
 An item is not answered correctly by any of the examinee

 Positive discrimination:
An item is correctly answered by superiors and is not answered correctly by inferiors. The
discriminative power range from +1 to -1.
Page |3

 Negative discrimination:
An item is correctly answered by inferiors and is not answered correctly by superiors.

 Distractors
In addition to examining the performance of an entire test item, teachers are often interested in
examining the performance of individual distractors (incorrect answer options) on multiple-choice items. By
calculating the proportion of students who choose each answer option, teachers can identify which
distractors are "functioning" and appear attractive to students who do not know the correct answer, and
which distractors are simply taking up space and are not chosen by many students. To eliminate Blind
guessing, which results in a correct answer purely by chance (which hurts the validity and Reliability of a
test item), teachers want as many plausible distractors as is feasible. Analysis of response options allows
teachers to fine tune and improves items they may wish to use again with future classes.


 Award of a score to each student
 Ranking in order of merit
 Identification of groups: high and low
 Calculation of the difficulty index of a question
 Calculation of the discrimination index of a question
 Critical evaluation of each question enabling a given question to be retained, revised or rejected

1. Award of a score to each student:

A practical, simple and rapid method is to perforate on answer sheet. By placing the perforated
sheet on the student's answer sheet the raw score (number of correct answers) can be found almost

2. Ranking in order of merit:

This step consists merely in ranking (listing) students in order of merit (in relation to the score)
proceeding from the highest to the lowest score.

3. Identification of high and low groups:

Ebel suggests the formation of “high” and “low” groups comprising only the first 27% (high group)
and the last 27% (low group) of all the students ranked in order of merit.

Why 27%?
Because 27% gives the best compromise between two desirable:
 making both groups as large as possible;
 making the two groups as different as possible.

Truman Kelley showed in 1939 that when each group consists of 27% of the total it can be said
with the highest degree of certainty that those in the high group are really superior (with respect to the
quality measured by the test) to those in the low group. If a figure of 10% were taken, the difference between
the two means of the competence of the two groups would be greater but the groups would be much smaller
and there would be less certainty regarding their mean level of performance.

Similarly, if a figure of 50% was taken the two groups would be of maximum size but since the
basis of our ranking is not absolutely accurate, certain students in the high group would really belong to the
low group, and vice versa.
Page |4

While the choice of 27% is the best, it is, however, not really preferable to 25% or 33%; and if it is
preferred to work with 1/4 or 1/3 rather than with the somewhat odd figure of 27% there is no great
disadvantage in so doing.

For the rest of our analysis we shall use 33%.

4. Calculation of the difficulty index of a question:

Index for measuring the easiness or difficulty of a test question. It is the percentage (%) of students
who have correctly answered a test question; it would be more logical to call it the easiness index. It can
vary from 0 to 100%.

The formula for difficulty index (D.I)
𝐑.𝐇 + 𝐑.𝐋
D.I = ×100

R.H – rightly answered in highest group

R.L - rightly answered in lowest group

N- total number of student in both group

In case non-response examinees available then,

The formula for difficulty index (D.I)

𝐑.𝐇 + 𝐑.𝐋
D.I = 𝐍 − 𝐍.𝐑

R.H – rightly answered in highest group

R.L - rightly answered in lowest group

N- total number of student in both group

N.R – no of non-response examinees

General guidelines for difficulty index (D.I):

Low difficulty value index means, that item is high difficulty one

Ex: D.I=0.20 » 20% only answered correctly for that item. So that item is too difficult

High difficulty value index means, that item is easy one

Ex: D.I=0.80 » 80% answered correctly for that item. So that item is too easy one.


0.20-0.30 Most difficult

0.30-0.40 Difficult

0.40-0.60 Moderate difficulty

0.60-0.70 Easy

0.70-0.80 Most easy

Page |5

5. Calculation of the discrimination index of a question:

An indicator showing how significantly a question discriminates between “high” and “low”
students. It varies from -1 to +1.

The formula for discrimination index (D.I)
R.H − R.L
D.I = 2× N

R.H – rightly answered in highest group

R.L - rightly answered in lowest group

N- total number of student in both group

General guidelines for discriminating index (D.I.):

According to Ebel,


Greater than or equal to 0.40 Very good item

0.30-0.39 Reasonably good but subject to improvement

0.20-0.29 Marginal item, need improvement

Less than 0.19 Poor items, rejected or revised

6. Critical evaluation of a question:

This is based on the indexes obtained.

Difficulty index: the higher this index the easier the question; it is thus an illogical term. It is
sometimes called “easiness index”, but in the American literature it is always called “difficulty index”.
In principle, a question with a difficulty index lying between 30% and 70% is acceptable (in that
range, the discrimination index is more likely to be high). But, some authors give values between 35% and
If for a test teacher use a group of questions with indexes in the range 30% - 70%, then the mean index
will be around 50%. It has been shown that a test with a difficulty index in the range of 50% - 60% is very
likely to be reliable as regards its internal consistency or homogeneity.

Discrimination index: the higher the index the more a question will distinguish (for a given group of
students) between “high” and “low” students. When a test is composed of questions with high discrimination
indexes, it ensures a ranking that clearly discriminates between the students according to their level of
performance, i.e., it gives no advantage to the low group over the high group. In other words, it helps you to
find out who are the best students.


 Both (D.V & D.I) are complementary not contradictory to each other
 Both should considered in selecting good items
 If an item has negatively discriminate or zero discrimination, is to be rejected whatever the
difficulty value
Page |6

It indicates the effectiveness of the distracters in multiple-choice items. Since multiple-choice items
are one of the most powerful and flexible objective items and many of the standardized tests utilize this form
of item. A thorough item analysis is done to indicate the extent to which the distracters or foils are effective
in each item.

Item analysis especially in multiple choice items aims at determining the effectiveness of the
distractors. Indices of item difficulty and discrimination power simply indicate whether an item is during
intended job, not why or why not. If a particular item is not functioning well or is found to be defective, we
have to examine its possible reasons and one obvious method is to examine the distractibility of the incorrect

The test item is a potential miskey if there are more students from the upper group who choose the
incorrect options than the key.

 It is not taught in the class properly.
 It is ambiguous.
 The correct answer is not in the given in the options.
 It has more than one correct answer.
 It contains grammatical clues to mislead the students.
 The student is not aware of the content.
 The students were confused by the logic of the question because it has double negatives.
 The student failed to study the lesson.


 Judge the worth or quality of a test.
 Aids in subsequent test revision.
 Increase skills in test construction.
 Planning future activities.
 Basis for discussing test result.
 Promotion of students to the next higher grade.
 Improve teaching methods and techniques.
Page |7

Author: Quaigrain K & Arhin AK
Research title: “Using reliability and item analysis to evaluate a teacher-developed test in educational
measurement and evaluation.”
Aim: To test the quality and explored the relationship between difficulty index (p-value) and
discrimination index (DI) with distractor efficiency (DE).
Sample: First-year postgraduate students pursuing Diploma in Education program.
Sample Size: 247 first-year students
Setting: Cape Coast Polytechnic during the 2016 academic session.
Method: Fifty multiple-choice questions were administered as an end of semester examination in
Educational Measurement course. The MCQs were analyzed for their level of difficulty, measure of
difficulty index (p-value), power of discrimination as measured by the discrimination index (DI), and
distractor analysis for all non-correct options. The data analysis also included Kuder–Richardson formula
and point biserial correlations. The results showed the reliability and quality of the test items included in
the test. The Kuder–Richardson formula (KR-20) was used to assess internal reliability of the test scores.
Results: The scores of 247 students ranged from 11 to 42 (out of 50). The mean test score was 29.23 and
the standard deviation was 6.36. The median score was 30 and the inter-quartile range value was 9. The
median score is slightly greater than the mean score. The skewness and kurtosis values for the scores were
−0.370 and −0.404, respectively. Mean scores according to groups were: lower: 20.62; middle: 29.28;
upper: 36.55. The reliability measured by KR-20 was 0.77. The mean difficulty index was 58% that
is p = 0.58.

Item analysis is a process which examines student responses to individual test items (questions) in
order to assess the quality of those items and of the test as a whole. There are three common types of item
analysis, which provide teachers with three different types of information; item difficulty, discrimination
index and effectiveness of distractor. Under discrimination index there are three types; zero discrimination or
no discrimination, positive discrimination and negative discrimination. There are 6 steps of item analysis;
award of a score to each student, ranking in order of merit, identification of groups (high and low),
Calculation of the difficulty index of a question, calculation of the discrimination index of a question and
critical evaluation of each question enabling a given question to be retained, revised or rejected. The results
of item analysis was used for judging the worth or quality of a test, plan for future activities, improve
teaching methods and technique.

Item Analysis is an important (probably the most important) tool to increase test effectiveness. Each
items contribution is analyzed and assessed.
To write effective items, it is necessary to examine whether they are measuring the fact, idea, or
concept for which they were intended. This is done by studying the student’s responses to each item. When
formalized, the procedure is called “item analysis”. It is a scientific way of improving the quality of tests and
test items in an item bank.
Page |8


 Guilbert JJ. Educational handbook for health personnel. 6th Edition. Geneva,
Switzerland. World Health Organization (WHO). 1987.

 Kamal AS. Item Analysis [Internet]. 2015 [update 2015 Sep. 07; Cited 2019 March 14.
Available from: https://www.slideshare.net/aneez103/item-analysis-52481431

 Quileste R. Item Analysis- Discrimination and Difficulty Index [Internet]. 2015 [update
2015 Oct. 01; Cited 2019 March 15. Available from:

 Quaigrain K & Arhin AK. Using reliability and item analysis to evaluate a teacher-
developed test in educational measurement and evaluation. Cogent Education. 2017
March 16; 4(1301013): 1-11. http://dx.doi.org/10.1080/2331186X.2017.1301013

 Arunkumar K. Item Analysis [Internet]. 2013 [update 2013 Jan. 20; cited 2019 March
14]. Available from: https://www.slideshare.net/energeticarun/item-analysis-16082143

 Item Analysis [Internet]. 2019 [update 2019 Jan 30; cited 2019 March 14]. Available
from: https://en.wikipedia.org/wiki/Item_analysis