Académique Documents
Professionnel Documents
Culture Documents
I. INTRODUCTION
010103-1
010103-2
TABLE I. Statistical information for all groups of students NSM stands for gymnasiums which specialize
in natural sciences and mathematics, GE for gymnasiums of general education type, and FL or CL for
gymnasiums which specialize in foreign or classical languages.
N
GE, FL or CL, and NSM 1676
GE and FL or CL
1247
NSM
429
University
141
23.3
23.3
33.3
63.3
010103-3
15.2
12.3
19.2
20.8
0.4
0.3
0.9
1.8
Pni =
eBnDi
.
1 + eBnDi
010103-4
an easy item means that some able students have unexpectedly failed on this item. Large outfit of a hard item means
that some students of low ability have unexpectedly succeeded on that item. Large infit values are generally considered more problematic than large outfit values. In this study
observed outfit values are larger than infit values, so only
outfit analyses are presented.
The expected value of both infit and outfit is 1. Items
which are sufficiently in accordance with the Rasch model to
010103-5
FIG. 4. Color The bubble chart for the gymnasium students showing outfit mean square statistics MNSQ vs item measure.
be productive for measurement will have infit and outfit values between 0.5 and 1.5 20. Items with significant infit or
outfit 1.21.5 should however be inspected more closely to
find out reasons for their misfit. When the data fit well they
indicate that the subscale items all contribute to a single
underlying construct, but largely misfitting items do not contribute to the underlying construct, either because they are
badly formulated or because they measure something different than the rest of the items.
To inspect closer the structure of the test a bubble chart
19 of items is presented in Fig. 4. Each item is represented
with a circle, whose size is proportional to the Rasch standard error of items calibration. Smaller circles represent
items with smaller uncertainty of calibration. It is useful to
think of items in a bubble chart as stepping stones 19
which define the direction of the underlying variable. In a
well constructed test circles on the bubble chart are clearly
separated but not with very large gaps. The horizontal distance of circles from the expected outfit value of 1 indicates
how well each item fits to the model. Ideally, items should be
close to the central axis of the bubble chart.
Inspection of Fig. 4 reveals that some circles overlap, and
many are very close in difficulty, thus making the ordering of
the items unclear. It is also noticeable that the circles become
larger for harder items on the top of the diagram because
these calibrations were estimated from a smaller number of
responses.
Items which are far away from the expected value of 1 are
either items located at the left ends of bubble chart items 10
and 28, which show more regular answer pattern than predicted by the model, or items located at the right end of the
chart items 5, 11, 13, and 18, which show too unpredictable
answer pattern. In deterministic models one expects a very
regular pattern of answers: student should succeed on all
items of difficulties below their ability and fail on all those
010103-6
TABLE II. Statistical information for items for the Rasch analysis of the gymnasium sample. Displayed
are total raw score, item measure in logit, Rasch standard error, infit and outfit MNSQ statistics, and pointmeasure correlation for each item.
Entry No.
Total score
Measure
Rasch S.E.
Infit MNSQ
Outfit MNSQ
Correlation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
606
261
872
409
208
1206
812
656
601
424
183
1023
143
334
256
619
183
215
568
453
422
475
313
668
283
125
678
258
465
216
0.59
0.68
1.33
0.05
0.97
2.29
1.17
0.74
0.58
0.01
1.13
1.74
1.44
0.34
0.70
0.63
1.13
0.93
0.48
0.11
0.00
0.19
0.43
0.77
0.57
1.60
0.80
0.69
0.15
0.92
0.05
0.07
0.05
0.06
0.08
0.06
0.05
0.05
0.05
0.06
0.08
0.05
0.09
0.06
0.07
0.05
0.08
0.08
0.05
0.06
0.06
0.06
0.07
0.05
0.07
0.10
0.05
0.07
0.06
0.08
0.96
1.07
0.99
1.08
1.07
1.05
1.02
1.02
1.07
0.84
1.05
0.97
0.99
0.90
1.03
0.96
0.97
1.09
1.03
0.94
1.04
0.99
0.97
0.94
0.96
0.91
1.04
0.90
1.04
0.98
0.91
1.11
1.03
1.09
1.26
1.09
1.04
1.04
1.07
0.77
1.28
1.00
1.29
0.83
0.98
0.97
1.07
1.21
1.04
0.92
1.03
1.00
0.96
0.93
1.02
1.00
1.04
0.77
1.05
1.14
0.43
0.29
0.38
0.31
0.25
0.28
0.35
0.36
0.32
0.53
0.25
0.38
0.27
0.46
0.33
0.42
0.33
0.24
0.36
0.43
0.34
0.38
0.39
0.44
0.38
0.37
0.35
0.46
0.34
0.33
that all items work together, but several problems are noticeable: poor targeting of the test on the population, too small
separation of items in the middle of the test, too small width
of the test, and the lack of easy items.
C. Rasch analysis for the university sample
The FCI is usually used as pretest and post-test in introductory physics courses. The pretest population is in most
cases predominantly non-Newtonian with low average scores
on the FCI. The gymnasium sample in this study is an example of such population. However, after the instruction, the
post-test population is expected to become predominantly
Newtonian if the instruction is efficient in promoting conceptual understanding. It is therefore important to see how the
FCI functions on both kinds of populations non-Newtonian
and predominantly Newtonian. To check the functioning of
the FCI on the predominantly Newtonian sample of students
010103-7
FIG. 5. Item-person map for the university students. The lefthand side shows the distribution of student abilities and the righthand side shows the distribution of item difficulties. Items are labeled as Q1Q30. M, S, and T are labels for the mean value, one
standard deviation, and two standard deviations of each distribution.
Each x represents one student.
center of rotation, a force in the direction of the boys motion, and a force pointing away from the center of rotation
acts on the boy. All wrong answers include the force in the
direction of motion. The significant infit value of item 18
implies that some students of high ability, which means students who are predominantly Newtonians, have unexpectedly failed on this item and included in their answers the
force in the direction of the boys motion. That is surprising
because other FCI items in which students typically express
similar alternative ideas, such as items 13, 17, and 30, have
too small infit and outfit, meaning that they are even too
regular in discriminating between the students of low and
high abilities. So why do some Newtonians fail on item 18?
A possible reason could be that good students notice that
unlike in items 13, 17, and 30in item 18 there is actually a
force in the direction of motion, a component of the gravitational force. This might be a source of confusion for some
students. Item 18 should be further investigated and maybe
reformulated.
In item 29 students are asked which combination of the
three suggested forces a downward force of gravity, an upward force exerted by the floor, and the net downward force
of the air acts on the chair at rest. It seems that the net
downward force of the air was confusing for students. Some
otherwise successful students unexpectedly failed on this
item because they have included this force in their answers.
However, this item really does not test student understanding
of Newtons laws but rather their understanding of the effects
of atmospheric pressure on the bodies in air. It seems that
this item does not test the same variable as the rest of the
items. Although item 29 does not degrade measurement, the
FCI would probably be a more coherent instrument if this
item was excluded from the test.
Another problematic item is item 7. Item 7 basically tests
the same thing as item 6 student understanding of kinematics of circular motion, asking students to predict the trajectory of the object in circular motion after restraints which
enable circular motion are removed ball exits from the horizontal frictionless circular channel in item 6 or stone on a
string moving horizontally in a circular path flies off after the
string breaks in item 7. One would expect similar results on
both questions, however item 7 is 1.03 logit more difficult
than item 6. The relatively large outfit value of item 7 was
caused by a number of otherwise successful students who
have failed on this item but usually succeeded on item 6. It is
possible that in item 7 students were confused because they
knew that the trajectory of the stone after the string breaks
will not be a straight but a curved line. Some students asked
researchers during the testing what the figure in item 7 represented. There are indications that the situation described in
item 7, as well as its distracters, was not clear to all students.
From comparisons of item positions in Figs. 3 and 5 and
item measures from Tables II and III it is noticeable that
several items items 68, 10, 14, 17, 21, 26, and 28 significantly change their difficulty by more than three standard
errors from non-Newtonian to Newtonian sample, therefore
making the item difficulty order different in the two analyses.
Since clearly distinguishable item difficulty levels are at least
three standard errors apart 19, items which change their
position by more than three standard errors should be care-
010103-8
TABLE III. Statistical information for items for the Rasch analysis of the university sample. Displayed
are total raw score, item measure in logit, Rasch standard error, infit and outfit MNSQ statistics, and pointmeasure correlation for each item.
Entry No.
Total score
Measure
Rasch S.E.
Infit MNSQ
Outfit MNSQ
Correlation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
117
70
123
98
59
115
92
78
95
111
67
131
58
109
58
113
85
66
100
102
72
98
67
119
93
75
98
96
95
68
1.14
0.82
1.33
0.24
0.97
1.03
0.00
0.52
0.12
0.82
0.93
2.23
1.27
0.73
1.27
0.93
0.26
0.96
0.33
0.41
0.74
0.19
0.93
1.26
0.04
0.63
0.24
0.16
0.12
0.89
0.24
0.19
0.25
0.20
0.19
0.23
0.19
0.19
0.20
0.22
0.19
0.33
0.19
0.21
0.19
0.22
0.19
0.19
0.20
0.20
0.19
0.20
0.19
0.24
0.19
0.19
0.20
0.20
0.20
0.19
1.01
0.87
0.96
1.05
0.99
1.01
1.13
1.09
1.10
0.85
0.96
0.99
0.80
1.07
1.10
0.94
0.96
1.20
1.12
0.99
1.15
0.96
1.10
0.86
0.84
0.81
1.10
0.77
1.18
0.83
0.86
0.84
1.10
1.05
1.04
0.83
1.33
1.11
1.18
0.67
0.93
0.73
0.75
0.90
1.16
0.96
0.86
1.39
1.22
1.20
1.17
0.83
1.10
0.59
0.73
0.72
1.08
0.62
1.34
0.75
0.35
0.57
0.24
0.39
0.50
0.37
0.35
0.42
0.36
0.48
0.53
0.26
0.63
0.36
0.45
0.39
0.50
0.37
0.34
0.39
0.40
0.46
0.44
0.44
0.55
0.61
0.37
0.58
0.31
0.60
010103-9
FIG. 6. Color The bubble chart for the university students showing outfit mean square statistics MNSQ vs item measure.
very closely spaced items in the middle of the test and not
enough items at the extremes of the test. This could be remedied by removing some of the items from the middle of the
test and adding new items at the extremes. A possible solution could also be the construction of two different tests, one
for pretest and another for posttest, which could be linked by
a certain number of common items to enable the comparison
of student scores on both tests. The Rasch model could provide linear measures of student abilities on both tests on the
same scale, and the difference in ability measures on pretest
and posttest could be used as the direct measure of student
gain. Hakes normalized gain 2, which is generally used as
the measure of student progress in the course, may also be
influenced by the nonlinearity of raw scores expressed as
percentages.
The Force Concept Inventory is an important instrument
which has contributed very much to the development of
physics education research and to the change of physics
teaching practice throughout the world. It is a measurement
instrument in physics education research, and as such it has
to be calibrated, inspected, and carefully monitored, just like
measurement instruments in physics. The Rasch model can
be used as a powerful tool for monitoring and improving the
functioning of the FCI, as well as of other diagnostic tests
used in PER.
ACKNOWLEDGMENTS
010103-10
11 R. K. Thornton, D. Kuhl, K. Cummings, and J. Marx, Comparing the force and motion conceptual evaluation and the
force concept inventory, Phys. Rev. ST Phys. Educ. Res. 5,
010105 2009.
12 V. P. Coletta, J. A. Phillips, and J. J. Steinert, Interpreting force
concept inventory scores: Normalized gain and SAT scores,
Phys. Rev. ST Phys. Educ. Res. 3, 010106 2007.
13 V. P. Coletta and J. A. Phillips, Interpreting FCI scores: Normalized gain, preinstruction scores, and scientific reasoning
ability, Am. J. Phys. 73, 1172 2005.
14 G. Rasch, Probabilistic Models for Some Intelligence and Attainment Tests Danmarks Paedagogiske, Copenhagen, 1960.
15 M. Planinic, L. Ivanjek, A. Susac, P. Pecina, R. Krsnik, M.
Planinic, Z. Jakopovic, and V. Halusek, GIREP-EPEC Conference 2007: Selected Contributions Zlatni rez, Rijeka, 2008,
pp. 295300.
16 J. M. Linacre, WINSTEPS Rasch measurement computer program, Winsteps.com, Chicago, 2006.
17 R. Dittrich and R. Hatzinger, Using GLIM for computing the
Rasch model and the corresponding multiplicative Poisson
model, http://www.ihs.ac.at/publications/ihsfo/fo167.pdf
18 B. D. Wright and M. H. Stone, Best Test Design MESA Press,
Chicago, IL, 1979.
19 T. G. Bond and C. M. Fox, Applying the Rasch Model: Fundamental Measurement in the Human Sciences Lawrence Erlbaum, Mahwah, NJ, 2001.
20 J. M. Linacre, A userss guide to WINSTEPS, www.winsteps.com
010103-11