Vous êtes sur la page 1sur 6


1. ^ American Educational Research Association, Psychological Association, &

National Council on Measurement in Education. (1999). Standards for
Educational and Psychological Testing. Washington, DC: American Educational
Research Association.
2. ^ Büttner J. (1997) Diagnostic validity as a theoretical concept and as a
measurable quantity. Clin Chim Acta. Apr 25;260(2):131-43. PMID 9177909
3. ^ Ogince M, Hall T, Robinson K, Blackmore AM. (2007) The diagnostic validity
of the cervical flexion-rotation test in C1/2-related cervicogenic headache. Man
Ther. 2007 Aug;12(3):256-62. PMID 17112768
4. ^ a b c d e Kendell R. & Jablensky A. (2003) Distinguishing Between the Validity
and Utility of Psychiatric Diagnoses Am J Psychiatry. January;160(1):4-12. PMID
5. ^ Kenneth S. Kendler (2003) Reflections on the Relationship Between Psychiatric
Genetics and Psychiatric Nosology Am J Psychiatry 163:1138-1146, July 2006
PMID 16816216

[edit] External links

• Cronbach, L. J. & Meehl, P. E. (1955). Construct validity in psychological tests.

Psychological Bulletin, 52, 281-302.

Criterion validity

Criterion validity evidence involves the correlation between the test and a criterion
variable (or variables) taken as representative of the construct. In other words, it
compares the test with other measures or outcomes (the criteria) already held to be valid.
For example, employee selection tests are often validated against measures of job
performance (the criterion), and IQ tests are often validated against measures of academic
performance (the criterion).

If the test data and criterion data are collected at the same time, this is referred to as
concurrent validity evidence. If the test data is collected first in order to predict criterion
data collected at a later point in time, then this is referred to as predictive validity

Predictive validity
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In psychometrics, predictive validity is the extent to which a score on a scale or test
predicts scores on some criterion measure.[1]

For example, the validity of a cognitive test for job performance is the correlation
between test scores and, for example, supervisor performance ratings. Such a cognitive
test would have predictive validity if the observed correlation were statistically

Predictive validity shares similarities with concurrent validity in that both are generally
measured as correlations between a test and some criterion measure. In a study of
concurrent validity the test is administered at the same time as the criterion is collected.
This is a common method of developing validity evidence for employment tests: A test is
administered to incumbent employees, then a rating of those employees' job performance
is obtained (often, as noted above, in the form of a supervisor rating). Note the possibility
for restriction of range both in test scores and performance scores: The incumbent
employees are likely to be a more homogeneous and higher performing group than the
applicant pool at large.

In a study of predictive validity, the test scores are collected first; then at some later time
the criterion measure is collected. Here the example is slightly different: Tests are
administered, perhaps to job applicants, and then after those individuals work in the job
for a year, their test scores are correlated with their first year job performance scores.
Another relevant example is SAT scores: These are validated by collecting the scores
during the examinee's senior year and high school and then waiting a year (or more) to
correlate the scores with their first year college grade point average. Thus predictive
validity provides somewhat more useful data about test validity because it has greater
fidelity to the real situation in which the test will be used. After all, most tests are
administered to find out something about future behavior.

As with many aspects of social science, the magnitude of the correlations obtained from
predictive validity studies is usually not high. A typical predictive validity for an
employment test might obtain a correlation in the neighborhood of r=.35. Higher values
are occasionally seen and lower values are very common. Nonetheless the utility (that is
the benefit obtained by making decisions using the test) provided by a test with a
correlation of .35 can be quite substantial.

Content validity

Content validity is a non-statistical type of validity that involves “the systematic

examination of the test content to determine whether it covers a representative sample of
the behavior domain to be measured” (Anastasi & Urbina, 1997 p. 114). For example,
does an IQ questionnaire have items covering all areas of intelligence discussed in the
scientific literature?
Content validity evidence involves the degree to which the content of the test matches a
content domain associated with the construct. For example, a test of the ability to add two
numbers should include a range of combinations of digits. A test with only one-digit
numbers, or only even numbers, would not have good coverage of the content domain.
Content related evidence typically involves subject matter experts (SME's) evaluating test
items against the test specifications.

A test has content validity built into it by careful selection of which items to include
(Anastasi & Urbina, 1997). Items are chosen so that they comply with the test
specification which is drawn up through a thorough examination of the subject domain.
Foxcraft et al. (2004, p. 49) note that by using a panel of experts to review the test
specifications and the selection of items the content validity of a test can be improved.
The experts will be able to review the items and comment on whether the items cover a
representative sample of the behaviour domain.

[edit] Representation validity

Representation validity, also known as translation validity, is about the extent to which an
abstract theoretical construct can be turned into a specific practical test.

[edit] Face validity

Face validity is an estimate of whether a test appears to measure a certain criterion; it

does not guarantee that the test actually measures phenomena in that domain. Indeed,
when a test is subject to faking (malingering), low face validity might make the test more

Face validity is very closely related to content validity. While content validity depends on
a theoretical basis for assuming if a test is assessing all domains of a certain criterion (e.g.
does assessing addition skills yield in a good measure for mathematical skills? - To
answer this you have to know, what different kinds of arithmetic skills mathematical
skills include ) face validity relates to whether a test appears to be a good measure or not.
This judgment is made on the "face" of the test, thus it can also be judged by the amateur.

Face validity is a starting point, but should NEVER be assumed to be provably valid for
any given purpose, as the "experts have been wrong before--the Malleus Malificarum
(Hammer of Witches) had no support for its conclusions other than the self-imagined
competence of two "experts" in "witchcraft detection," yet it was used as a "test" to
condemn and burn at the stake perhaps 100,000 women as "witches."
Pearson's Correlation (1 of 3)
The correlation between two variables reflects the degree to which the variables are
related. The most common measure of correlation is the Pearson Product Moment
Correlation (called Pearson's correlation for short). When measured in a population the
Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When
computed in a sample, it is designated by the letter "r" and is sometimes called "Pearson's
r." Pearson's correlation reflects the degree of linear relationship between two variables.
It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear
relationship between variables. The scatterplot shown on this page depicts such a
relationship. It is a positive relationship because high scores on the X-axis are associated
with high scores on the Y-axis.

Pearson's Correlation (2 of 3)
A correlation of -1 means that there is a perfect negative linear relationship between
variables. The scatterplot shown below depicts a negative relationship. It is a negative
relationship because high scores on the X-axis are associated with low scores on the Y-

A correlation of 0 means there is no linear relationship between the two variables. The
second graph shows a Pearson correlation of 0.
Correlations are rarely if ever 0, 1, or -1. Some real data showing a moderately high
correlation are shown

Pearson's Correlation (3 of 3)
Next section: Computational formula

The scatterplot below shows arm strength as a function of grip strength for 147 people
working in physically-demanding jobs (click here for details about the study). The plot
reveals a strong positive relationship. The value of Pearson's correlation is 0.63.

Other information about Pearson's correlation can be obtained by clicking one of the
following links:

• Computational formula
• Sampling distribution
• Confidence interval
• Confidence interval on difference between r's