Vous êtes sur la page 1sur 20

TEST VALIDITY refers to the extent that a test actually

measures what it claims to measure.


It is important to remember that the actual focus of
validity is on the INTERPRETATION of test scores, not on
the test itself.
Principles for Validation
2. Interpretation is based on appropriate evidence.
3. Use of assessment is based on appropriate evidence.
4. Interpretation and use must be based on appropriate
values.
5. There are intentional and unintentional
consequences to the interpretation and use of
assessment results.
Content Validity
•Content validity deals with the extent to which a test
measures a teacher’s instructional objective.
–The content validity for a particular test is established by
having content area experts review the test items
Content Validity
•Content validity gives information about whether the
test “looks valid”.
Criterion-related Validity
•Demonstrate a relationship between performance on the test
and performance on some external criterion
•Concurrent validity assesses the extent to which a test may be
used to estimate (or is associated with) an individual’s
CURRENT STANDING on the criterion variable.
–Example:

The Terra Nova is compared to SAT and ACT scores.


They also compare their test scores with the National
Assessment of Educational Progress (NAEP),
and the Third International Mathematics and Science
Study (TIMSS).
Concurrent Validity

•Concurrent validity is especially important to establish when you


are interested in validating a new test that measures the same skills as
existing tests.
•The new test may be shorter or more easily administered than
currently existing tests.
•One usually looks for validity coefficients of .80 or higher.
Predictive Validity
Predictive validity indicated the extent to which an individual’s
FUTURE level of performance on some criterion variable can be
predicted from current test data.

Generally, one looks for predictive validity coefficients of .60 or


higher.
•The important thing to remember about predictive validity is that a
substantial time interval occurs between the administration of the test
being validated and the gathering of the criterion data.
•With concurrent validity the test scores and the criterion data are
collected at approximately the same time.
Predicting Performance on the Florida
Teacher Certification Examination
(Villeme, Hall, & Phillipy, 1985)

FTCE-Math FTCE-Reading FTCE-Writing

GPA .39 .25 .14 ns

ACT-English .53 .54 .45

ACT-Math .61 .35 .20 ns

ACT-Social .51 .50 .47


Science
ACT-Natural .51 .54 .37
Science
ACT-Composite .67 .58 .45
Criterion-related validity Anchor
Concurrent validity the instructional objectives
another well-accepted test
for measuring the same
thing
Predictive validity some future behavior or
condition

What happens if we develop a test to measure something not


previously measured, not measured well, and no criterion exists for
anchoring the test?
Construct Validity
•Construct validity refers to the extent that a test measures a
theoretical construct or trait.
•In construct validity, we both expect the test to predict behavioral
characteristics that should be related to test scores (convergent
validity) as well as those behaviors that should not be related to test
scores (discriminate validity).
Construct Validity
•With construct validity, no objective criterion exists for anchoring
the test.
–Unlike concurrent validity second measure available of the behavior the test
is attempting to assess.
–Unlike predictive validity, there is no measure of future behavior available.
–Example:

The Terra Nova uses a series of statements of skills, concepts and


processes to measure subject areas.
The test is then compared to the statements and evaluated by teachers,
curriculum experts, and other educators.
Interpreting Validity Coefficients
•In general, the higher the validity coefficient, the more valid the test
is.
•Type of validity being reported
–Concurrent validity coefficients are generally higher than
predictive validity coefficients.
•This is because predictive validity coefficients require a time interval for
their determination, and a lot can change in the intervening years.

–Concurrent validity coefficients in the >.80 are good


–Predictive coefficients >.60 are good.
Group Variability

•Correlation coefficients increase as group variability increases


–Higher validity coefficients are obtained from heterogeneous groups
that from homogeneous groups.
–(Remember what happens to truncated scores.)
Adequacy of the criterion
•Whenever predictive or construct validity is at issue, it’s
necessary to remember that the size of the validity coefficient
depends on both the reliability of the predictor and criterion
measures.
•Example:
•You have just created a paper-pencil measure of teaching
effectiveness.
•What criterion will you use to validate your measure
against--teacher salary, student achievement test scores,
principal’s ratings, peer ratings?
Criterion Validity
The point is that the criterion you validate your
measure against will greatly influence the size of the resulting
validity coefficient.
Remember!
2. The concept of validity applies to the ways in which we
interpret and use the assessment results and not to the
assessment procedure itself.
3. The assessment results have different degrees of validity for
different purposes and for different situations.
4. Judgments about the validity of your interpretations or uses
of assessment results should be made only after you have
studied and combined several types of validity evidence.
Classification
Type of Validity

Aptitude test (CogAT)

Achievement test (ITBS)

Personality test (TAT)

Intelligence test (WISC)

Test of Creativity (TTCT)


•What types of validity evidence is most
important for classroom tests?
–How would you test the validity of a teacher-made
test?
•What types of validity evidence are most
important for an IQ test?
–How would you test the validity of a teacher-made
test?

Vous aimerez peut-être aussi