Académique Documents
Professionnel Documents
Culture Documents
F:hs26004.tex; VTEX/DL p. 1
5
6
5
6
9
10
9
10
11
11
12
12
13
14
13
1. Introduction
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
15
When a person is tested or observed multiple times, such as a student tested for mathematics achievement or a Navy machinist mate observed while operating engine room
equipment, scores reflecting his or her performance may or may not agree. Not only may
individuals scores vary from one testing to another, calling into question the defensibility of using only one score for decision-making purposes, but the rankings of individuals
may also disagree. The concern of reliability studies is to estimate the consistency of
scores across repeated observations. Reliability coefficients quantify the consistency
among the multiple measurements on a scale from 0 to 1.
In this chapter we present reliability coefficients as developed in the framework of
classical test theory, and describe how the conception and estimation of reliability was
broadened in generalizability theory. Section 2 briefly sketches foundations of classical
test theory (see the chapter by Lewis for a thorough development of the theory) and focuses on traditional methods of estimating reliability. Section 3 reviews generalizability
theory, including applications and recent theoretical contributions.
30
35
36
37
38
39
40
41
42
43
44
45
18
19
20
21
22
23
24
25
26
27
28
29
31
33
34
17
30
31
32
16
32
33
Classical test theorys reliability coefficients are widely used in behavioral and social
research. Each provides an index of measurement consistency ranging from 0 to 1.00
and their interpretation, at first blush, is relatively straightforward: the proportion of
observed-score variance attributable to true-scores (stable or nonrandom individual differences) (see Lewis chapter for definitions in Classical Test Theory). Coefficients at or
above 0.80 are often considered sufficiently reliable to make decisions about individuals based on their observed scores, although a higher value, perhaps 0.90, is preferred
if the decisions have significant consequences. Of course, reliability is never the sole
consideration in decisions about the appropriateness of test uses or interpretations.
Coefficient alpha (also known as Cronbachs alpha) is perhaps the most widely
used reliability coefficient. It estimates test-score reliability from a single test administration using information from the relationship among test items. That is, it provides an
1
34
35
36
37
38
39
40
41
42
43
44
45