Vous êtes sur la page 1sur 20

Item Analysis

Reference: Anastasi, A. (1988).


Psychological testing, 6th edition.
New York: Macmillan Publishing Co.

Item Difficulty
Item difficulty is used to choose items of a
suitable difficulty level
Percentage Passing: The difficulty of an
item is usually expressed in the percentage
of persons who answer it correctly
Tests are usually arranged with items in
order of difficulty, beginning with easier
items

What are the ranges for item


difficulty?
Item difficulty (p) ranges from 0 to 1.0
A zero means that no one got the item right,
while a 1.0 means that everyone got the
item right
The closer an item gets to 0 or to 1.0, the
less information it contributes about test
takers

What levels of item difficulty are


desirable in a test?
It is best to chose items close to p=.5,
however if the items within a test are highly
intercorrelated (the test is more
homogeneous), then there should be a wider
range of item difficulties (but they should
average 0.5)

Relating Item Difficulty to


Testing Purpose
The level of item difficulty required
depends, in part, on the purpose of the test
For screening tests, the item difficulty
should be close to the desired selection ratio
For example, if you want to select the upper
30% of the cases, then p=.30
If the purpose is to test for mastery, then p
is set higherprobably around .80 or .90

Item Discrimination (D)


Item discrimination refers to whether an item can
distinguish between people who scored high or
scored low on a test
This is calculated by a correlation between each
item score and the total test score
Item score calculation:
= 1 (if the item was answered correctly)
= 0 (if the item was answered wrong)
A point biserial correlation is used (special case of
Pearson)

Shortcut for Calculating D


1. Divide group of examinees in half
2. Count # in the high group who got item
right RH
3. Count # in low group who got item right
(RL)
Then, D = RH - RL /(0.5*N)
As an alternative, if N is very large, then use the top 27%
and the bottom 27% and leave out the middle portion

Example
If N = 100 and everyone in the high group
gets the item right:
D = (50-0)/50 = 1
This is the highest that D can be
If D = 1, then the item perfectly
discriminates between high and low scoring
groups

If D = -1, then the item doesnt discriminate


at all. It tells you that something is wrong
with the item (people who scored high on
the test, missed that item).
Whenever you get a negative value for D,
the item needs to be reviewed

What Values of D are Desirable?


To maximize internal consistency we want
items with a value of D close to 1.0. We
want to eliminate items with a value close
to 0 as well as negative items
Relationship between p and D:
If p = .5, then D is close to +1.0

Distractor Analysis
Distractor analysis looks at a proportion of
examinees who choose a certain distractor in
multiple choice tests
This analysis tells you what types of mistakes are
being made
In looking at a discrimination index for distractors,
you want to get a negative value. It is desirable to
have a negative correlation between people who
scored high and also chose that distractor

Conversely, there should be a positive


correlation for the correct choice
This means that people who scored high on
the test chose the correct response for a
particular item
(Example of distractor analysis on handout)

Item Response Theory (IRT)


Also called Latent Trait Theory, Item
Characteristic Curve (ICC) Theory

IRT gives information about how an item


functions within a test
It takes ability levels into account
Published information about item analysis
can help test users in their evaluation of
these tests
Using item analysis, test developers can
improve the reliability of their instruments

Item-test regression: combines item


difficulty and item discrimination in the
same graph
This allows us to visualize how effectively
an item functions

Item test regression for items 7


(orange) and 13 (purple)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

10

Basic Features of IRT


Item performance is related to the estimated
amount of the respondents latent trait
(this is a statistical conceptnot a
psychological one)
Item characteristic curves (ICCs) are plotted
from mathematically derived functions.
Different IRT models use different
assumptions

Examples of ICCs
ICCs for 3 items
The item
discrimination
parameter
indicates the slope
of the curve
A more gradual
slope (item 3green) has a lower
discriminative
value

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

How does the ICC relate to item


bias?
A set of items is judged unbiased if the
ICCs for every item are the same for both
groups.
This is because irrelevant sources of
variance (as opposed to item bias) will
affect both groups the same way
Unequal ICCs give evidence of item bias

Advantages and Disadvantages


ICC scaling allows for test-free
measurement: you can compare people on a
trait even if they answered different test
items
However, a large sample (approx. 1000) is
needed to estimate the item parameters

Vous aimerez peut-être aussi