Académique Documents
Professionnel Documents
Culture Documents
A R T I C L E I N F O
A B S T R A C T
Article history:
Received 26 January 2009
Received in revised form 24 April 2009
Accepted 29 April 2009
Content validation theory and practice have received considerable attention in the nursing
research literature. This paper positions the discourse within the broader scientic
literature on validity of measurement. The content validity index has been recommended
as a means to quantify content validity; this paper critically examines its origins,
theoretical interpretations, and statistical properties. In addition, the author sets out to
understand why many nurse researchers are occupied with content validity and its
estimation. This investigation may be of interest to the scholar who desires to deeply
understand the issues surrounding validity of measurement.
2009 Elsevier Ltd. All rights reserved.
Keywords:
Validity
Content validity
Operational denition
Interrater agreement
1275
1276
1277
1278
Table 1
Frequency distributions, scale values, information transmitted, and measures of multirater agreement for four items rated by ve experts.
Item
A
B
C
D
Response categories
1
4
1
1
3
1
1
3
1
Variance in scale values = 3.27
Information transmitted = 1.05 bits
Item
Agreement
4
0.00
1.31
3.37
4.68
1
4
Response categories
1 and 2
A
B
C
D
Scale values
Scale values
3 and 4
5
4
1
1
4
5
Variance in scale values = 1.28
Information transmitted = 0.60 bits
0.00
1.12
1.97
3.09
Discounted
agreement
0.60
0.47
0.30
0.07
0.30
0.07
0.60
0.47
Mean proportion agreement = 0.45
Mean proportion agreement due to chance = 0.25
Multirater kappa = 0.27
Agreement
Discounted
agreement
1.00
1.00
0.60
0.20
0.60
0.20
1.00
1.00
Mean interrater agreement = 0.80
Mean proportion agreement due to chance = 0.50
Multirater kappa = 0.60
Note: Upper panel shows analysis of data as obtained. Lower panel shows analysis of the same data collapsing over response categories. Original response
categories: 1 = not relevant, 2 = somewhat relevant, 3 = quite relevant, and 4 = very relevant. Scale values determined using method of successive categories
(Guilford, 1954). Information transmitted = 1/2[log2(variance + 1)]. Agreement, mean proportion agreement, mean proportion agreement due to chance,
and multirater kappa calculated using Fleiss (1971) Equations 2, 3, 5, and 7, respectively. Discounted agreement = (agreement mean proportion
agreement due to chance)/(1 mean proportion agreement due to chance).
1279
Table 2
Evaluation of I-CVIs with different numbers of experts and agreement showing lower-bound condence limits.
(1)
(2)
(3)
(4)
(5)
(6)
(7)
No. of experts
I-CVI
pc*
k*
Evaluation
(8)
Evaluation of 95% L.C.L
3
3
4
4
5
5
6
6
6
7
7
7
8
8
8
9
9
9
3
2
4
3
5
4
6
5
4
7
6
5
8
7
6
9
8
7
1.000
0.667
1.000
0.750
1.000
0.800
1.000
0.833
0.667
1.000
0.857
0.714
1.000
0.875
0.750
1.000
0.889
0.778
0.125
0.375
0.063
0.250
0.031
0.156
0.016
0.094
0.234
0.008
0.055
0.164
0.004
0.031
0.109
0.002
0.018
0.070
1.000
0.467
1.000
0.667
1.000
0.763
1.000
0.816
0.565
1.000
0.849
0.658
1.000
0.871
0.719
1.000
0.887
0.761
Excellent
Fair
Excellent
Good
Excellent
Excellent
Excellent
Excellent
Fair
Excellent
Excellent
Good
Excellent
Excellent
Good
Excellent
Excellent
Excellent
0.292
0.094
0.398
0.194
0.478
0.284
0.541
0.359
0.223
0.590
0.421
0.290
0.631
0.473
0.349
0.664
0.518
0.400
Poor
Poor
Poor
Poor
Fair
Poor
Fair
Poor
Poor
Fair
Fair
Poor
Good
Fair
Poor
Good
Fair
Poor
12
13
13
20
20
35
35
54
54
12
13
12
19
18
35
34
53
52
1.000
1.000
0.923
0.950
0.900
1.000
0.971
0.981
0.963
2.44E04
1.22E04
1.59E03
1.91E05
1.81E04
2.91E11
1.02E09
3.00E15
7.94E14
1.000
1.000
0.923
0.950
0.900
1.000
0.971
0.981
0.963
Excellent
Excellent
Excellent
Excellent
Excellent
Excellent
Excellent
Excellent
Excellent
0.735
0.753
0.640
0.751
0.683
0.900
0.851
0.901
0.873
Good
Excellent
Good
Excellent
Good
Excellent
Excellent
Excellent
Excellent
Note: Values in the upper portion of columns 16 are after those found in Polit et al. (2007, Table 4). I-CVI = item-level content validity index;
pc* = probability of chance agreement; k* = modied kappa coefcient designating proportion agreement on relevance Polit et al. (2007). Evaluation criteria
for kappa: poor < 0.40, fair = 0.400.599, Good = 0.600.749, excellent 0.75 (Fleiss, 1981). 95% L.C.L. = ninety-ve percent lower-bound condence limit
for k* based on condence limits for population proportion (Zar, 1999, p. 527).
1280
1281
1282
measurement specialists should abandon the term content validity and instead speak specically to issues such
as domain clarity and the adequacy of content domain
sampling when discussing instrument development.
The various methods for quantifying what has been
called content validity in the nursing research literature
were shown to be decient on one, or more, of the
following technical grounds: they distort interrater
agreement and discard information by collapsing response
categories, they mis-specify the statistical model of
interrater agreement, they do not adequately correct for
chance agreement, and, they neglect consideration of the
huge sampling errors incurred by the use of small samples
of experts. If nurse researchers feel it necessary to seek
expert opinion on item relevance as part of instrument
development, then large samples of experts should be
employed, response categories should never be collapsed
after the fact, indices like multirater kappa should be used
along with their standard errors to examine interrater
agreement, and the results should not be interpreted as
addressing validity but the acceptability of an operational
denition.
Nursing comprises the application and adaptation of
established scientic knowledge to the promotion,
improvement, and maintenance of human health and
well-being (Beckstead and Beckstead, 2006). The eld of
psychology has much to offer nursing in this regard.
Theories from clinical psychology have already inuenced
many nurse scholars. Humanistic psychologists Carl
Rogers and Abraham Maslow shared an optimistic view
of people as being capable of self-care and self-determination given a secure, nurturing environment; themes that
pervade many nursing theories. Beckstead and Beckstead
showed that the ideas of these and other psychologists
have been productively incorporated into the thinking of
various nurse scholars. The current article highlights how
the ideas of some distinguished methodologists in the eld
of psychology have shifted attention from content to
construct validity. The ideas of Cronbach and Meehl
regarding construct validity, introduced into psychology
some 50 years ago, have served to productively redirect
intellectual energies in that eld. Nurse scholars can
benet from these ideas as well. Focusing our attention on
the attributes or processes that underlie the individual
differences that we see in our own data (collected from
patients, students, or colleagues), rather than debating
how best to quantify what an insufciently small sample of
experts think about an arbitrary operational denition,
seems a much more productive activity for advancing
nursing as a scientic enterprise.
Conict of interest. None declared.
Funding. None.
References
American Psychological Association, American Educational Research
Association & National Council on Measurements Used in Education,
1954. Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin 51 (2), 201238.
American Psychological Association, American Educational Research
Association & National Council on Measurements Used in Education,
1974. Standards for Educational and Psychological Tests. American
Psychological Association, Washington.
1283
Waltz, C.F., Bausell, R.B., 1981. Nursing Research: Design, Statistics and
Computer Analysis. F.A. Davis, Philadelphia.
Wynd, C.A., Schmidt, B., Schaefer, M.A., 2003. Western Journal of Nursing
Research 25 (5) 508518.
Zar, J.H., 1999. Biostatistical Analysis, 4th ed. Prentice Hall, Upper Saddle
River, NJ.