Vous êtes sur la page 1sur 31

PSY 6535 Psychometric Theory Validity Part 1

Overview
Content validity Criterion-related validity

Issues of Validity
Does the test actually measure what it is purported to measure? Do differences in tests scores reflect true differences in the underlying construct? Are inferences based on the test scores justified?

Example: Validity of a Measure


The use of the polygraph (lie detector test) is not nearly as valid as some say and can easily be beaten and should never be admitted into evidence in courts of law, say psychologists from two scientific communities who were surveyed on the validity of polygraphs. APA News Release

Validity is About Inferences.


Cronbach (1971): Validation is the process of collecting evidence to support the types of inferences that are drawn from test scores. Validity is the degree to which all of the accumulated evidence supports the intended interpretation of test scores for the intended purpose. (AERA, APA, NCME, 1999, p. 11).

Validity for what?


Inferences and decisions based on test scores A person with this score is likely to Be a better parent Do well in law school Be most satisfied as an engineer Steal from his/her employer

Types of Validity
Content Criterion-based Construct
Construct (general evidence gathering)

Content (more theory-based)

Criterion-related (more data-based)

Content Validity of a Measure


Collectively, do the items adequately represent all of the domains of the construct of interest? Staring Point: A Well Defined Construct. Often have a panel of experts judge whether items adequately sample the domain of interest.

Example: 1st Grade Math Objectives


What 1st Graders in School District X Should: A. Be able to add any two positive numbers whose sum is 20 or less. B. Subtract any two numbers (each less than 15) whose difference is a positive number.

Item Pool Which are Content Valid?


1. 2. 3. 4. 5. 6. 13 + 2 =___ 12 5 =____ 10 13 = ____ 26 15 = ____ 13 + 4 7 = ____ Sammy has 10 pennies. He lost 2. How many pennies does Sammy have now?
A. 2 pennies; B. 8 pennies; C. 10 pennies; D. 12 pennies

Example: Depression
(Modified from the DSM IV)
A complex of symptoms marked by: Disruptions in appetite and weight Insomnia or hypersomnia Loss of interest or pleasure in activities Loss of energy Feelings of worthlessness Feels sad or empty nearly everyday Frequent deathrelated thoughts

Item Pool Which are Content Valid?


I feel blue or sad. I feel nervous when speaking to someone in authority. I have crying spells. Im always willing to admit it when I make a mistake. I felt that everything I did was an effort. I never resent being asked to return a favor. I experience spells of terror or panic.

Assessing Content Validity


Steps for assessing content validity:
1. 2.
3.

Describe the content domain Determine the areas of the content domain that are measured by each item Compare the structure of the test with the structure of the content domain

Challenges:
Difficulty in defining the domain Categorizing the content domain and map items to the categories Ensure representativeness

Contamination & Deficiency


Construct Measure

Measure Deficiency Relevance (Content Validity)

Measure Contamination

What do we want?
A measure that samples from all important domains or aspects (Low Deficiency) A measure that does not include anything irrelevant (Low Contamination) That is, a measure that adequately captures all of the domains of the construct that it is intended to measure. (High Content Validity)

Criterion-related Evidence for a Measure


What should this test predict? What inferences are we going to use this test to make? Criterion-related validation is data based. Does the test actually predict behavior that it is supposed to predict? Correlate an honesty test with employee theft Correlate a pencil and paper measure of delinquency with arrest records Correlate a measure of study habits with actual grades

Two Main Types of Criterion-Related Validity


Predictive validity future criteria Concurrent validity current criteria

Criterion-related validity: Concurrent validity


Students who have been admitted to Wayne State take the SAT. Their GPA is recorded at the same time. The correlation between the test scores and performance is computed. This correlation is sometimes called a validity coefficient.

Criterion-related validity: Predictive validity


Students take the SAT (or ACT) during High School and then some are selected into Wayne State. Later, their SAT scores are correlated with their college GPA. This correlation is also sometimes called a validity coefficient. If SAT scores and college GPA are correlated, then the SAT has some degree of predictive validity for predicting college GPA.

Problem: Small Samples = Imprecise Estimates


Sample Size 10 20 50 100 200 400 1000 Observed Correlation .50 .50 .50 .50 .50 .50 .50 Lower Bound of 95% CI -.33 .04 .25 .33 .39 .42 .45 Upper Bound of 95% CI .89 .79 .69 .64 .60 .57 .55

Problem: Range Restriction


Range Restriction The variance in scores in the sample at hand is smaller than the variance in scores in the population of interest. Range restriction is thought to reduce the observed correlation between test scores and criterion measures. (Exceptions are possible) In the previous examples where was the restriction/why was there restriction?

Example: range restriction

Job Performance

General cognitive ability

Example: range restriction

Job Performance

General cognitive ability

Example: range restriction

Job Performance

General cognitive ability

When/where might we find range restriction?


Sample of employees chosen based on high test scores and interview scores (high scores on predictor) Sample of current employees promoted due to high performance (high scores on criterion measure) In both cases variability is being reduced (either in the predictor variable or in the criterion variable)

Measurement Error
Reliability Index of the presence of measurement error (1.0 reliability = No error) Unreliability in the predictor and criterion serves to reduce (attenuate) their observed correlation Researchers are often concerned about attenuation in predictor-criterion associations

When/where might we find unreliability? Everywhere!


Tests used as predictors (e.g., measures of depression) Criterion measures (e.g., ratings of client well-being) Unreliability is a concern for both predictors and criteria Unreliability in both can reduce correlations

Assume that measures of X and Y have alphas of .60 and .70, respectively. The observed r between X and Y is .40. However, we might want to know how much this correlation is depressed by measurement error.

Correction for Attenuation


rc
Where:
rxy = observed correlation between x and y rxx and ryy = reliability coefficients for x and y

rxy rxx ryy

Correcting for Measurement Error


Reliability Measure x .50 .60 .70 .80 Reliability Measure y .60 .70 .80 .90 Observed Correlation .40 .40 .40 .40 Corrected Correlation .73 .62 .53 .47

.90

.90

.40

.44

Summary Issues Criterion-related Validity


What sample will we use?
Small Samples More Imprecision in the correlation estimate Issues of Generalization

What is our Criterion? How do we measure it?


Variability is needed for both Predictor and Criterion variables Attenuation Due to Measurement Error

Predictor-Criterion Overlap
Same items on both measures bad!

Vous aimerez peut-être aussi