PSY 6535 Psychometric Theory Validity - Part 1

PSY 6535 Psychometric Theory Validity Part 1
Overview
Content validity Criterion-related validity
Issues of Validity
Does the test actually measure what it is purported to measure? Do differences in tests scores reflect true differences in the underlying construct? Are inferences based on the test scores justified?
Example: Validity of a Measure

The use of the polygraph (lie detector test) is not nearly as valid as some say and can easily be beaten and should never be admitted into evidence in courts of law, say psychologists from two scientific communities who were surveyed on the validity of polygraphs. APA News Release
Validity is About Inferences.

Cronbach (1971): Validation is the process of collecting evidence to support the types of inferences that are drawn from test scores. Validity is the degree to which all of the accumulated evidence supports the intended interpretation of test scores for the intended purpose. (AERA, APA, NCME, 1999, p. 11).
Validity for what?

Inferences and decisions based on test scores A person with this score is likely to Be a better parent Do well in law school Be most satisfied as an engineer Steal from his/her employer
Types of Validity
Content Criterion-based Construct
Construct (general evidence gathering)
Content (more theory-based)
Criterion-related (more data-based)
Content Validity of a Measure

Collectively, do the items adequately represent all of the domains of the construct of interest? Staring Point: A Well Defined Construct. Often have a panel of experts judge whether items adequately sample the domain of interest.
Example: 1st Grade Math Objectives

What 1st Graders in School District X Should: A. Be able to add any two positive numbers whose sum is 20 or less. B. Subtract any two numbers (each less than 15) whose difference is a positive number.
Item Pool Which are Content Valid?

1. 2. 3. 4. 5. 6. 13 + 2 =___ 12 5 =____ 10 13 = ____ 26 15 = ____ 13 + 4 7 = ____ Sammy has 10 pennies. He lost 2. How many pennies does Sammy have now?
A. 2 pennies; B. 8 pennies; C. 10 pennies; D. 12 pennies
Example: Depression
(Modified from the DSM IV)
A complex of symptoms marked by: Disruptions in appetite and weight Insomnia or hypersomnia Loss of interest or pleasure in activities Loss of energy Feelings of worthlessness Feels sad or empty nearly everyday Frequent deathrelated thoughts
Item Pool Which are Content Valid?

I feel blue or sad. I feel nervous when speaking to someone in authority. I have crying spells. Im always willing to admit it when I make a mistake. I felt that everything I did was an effort. I never resent being asked to return a favor. I experience spells of terror or panic.
Assessing Content Validity

Steps for assessing content validity:
1. 2.
3.
Describe the content domain Determine the areas of the content domain that are measured by each item Compare the structure of the test with the structure of the content domain
Challenges:
Difficulty in defining the domain Categorizing the content domain and map items to the categories Ensure representativeness
Contamination & Deficiency

Construct Measure
Measure Deficiency Relevance (Content Validity)
Measure Contamination
What do we want?
A measure that samples from all important domains or aspects (Low Deficiency) A measure that does not include anything irrelevant (Low Contamination) That is, a measure that adequately captures all of the domains of the construct that it is intended to measure. (High Content Validity)
Criterion-related Evidence for a Measure

What should this test predict? What inferences are we going to use this test to make? Criterion-related validation is data based. Does the test actually predict behavior that it is supposed to predict? Correlate an honesty test with employee theft Correlate a pencil and paper measure of delinquency with arrest records Correlate a measure of study habits with actual grades
Two Main Types of Criterion-Related Validity

Predictive validity future criteria Concurrent validity current criteria
Criterion-related validity: Concurrent validity

Students who have been admitted to Wayne State take the SAT. Their GPA is recorded at the same time. The correlation between the test scores and performance is computed. This correlation is sometimes called a validity coefficient.
Criterion-related validity: Predictive validity

Students take the SAT (or ACT) during High School and then some are selected into Wayne State. Later, their SAT scores are correlated with their college GPA. This correlation is also sometimes called a validity coefficient. If SAT scores and college GPA are correlated, then the SAT has some degree of predictive validity for predicting college GPA.
Problem: Small Samples = Imprecise Estimates

Sample Size 10 20 50 100 200 400 1000 Observed Correlation .50 .50 .50 .50 .50 .50 .50 Lower Bound of 95% CI -.33 .04 .25 .33 .39 .42 .45 Upper Bound of 95% CI .89 .79 .69 .64 .60 .57 .55
Problem: Range Restriction

Range Restriction The variance in scores in the sample at hand is smaller than the variance in scores in the population of interest. Range restriction is thought to reduce the observed correlation between test scores and criterion measures. (Exceptions are possible) In the previous examples where was the restriction/why was there restriction?
Example: range restriction
Job Performance
General cognitive ability
Job Performance
Job Performance
When/where might we find range restriction?

Sample of employees chosen based on high test scores and interview scores (high scores on predictor) Sample of current employees promoted due to high performance (high scores on criterion measure) In both cases variability is being reduced (either in the predictor variable or in the criterion variable)
Measurement Error
Reliability Index of the presence of measurement error (1.0 reliability = No error) Unreliability in the predictor and criterion serves to reduce (attenuate) their observed correlation Researchers are often concerned about attenuation in predictor-criterion associations
When/where might we find unreliability? Everywhere!

Tests used as predictors (e.g., measures of depression) Criterion measures (e.g., ratings of client well-being) Unreliability is a concern for both predictors and criteria Unreliability in both can reduce correlations
Assume that measures of X and Y have alphas of .60 and .70, respectively. The observed r between X and Y is .40. However, we might want to know how much this correlation is depressed by measurement error.
Correction for Attenuation

rc
Where:
rxy = observed correlation between x and y rxx and ryy = reliability coefficients for x and y
rxy rxx ryy
Correcting for Measurement Error

Reliability Measure x .50 .60 .70 .80 Reliability Measure y .60 .70 .80 .90 Observed Correlation .40 .40 .40 .40 Corrected Correlation .73 .62 .53 .47
.90
.90
.40
.44
Summary Issues Criterion-related Validity

What sample will we use?
Small Samples More Imprecision in the correlation estimate Issues of Generalization
What is our Criterion? How do we measure it?

Variability is needed for both Predictor and Criterion variables Attenuation Due to Measurement Error
Predictor-Criterion Overlap
Same items on both measures bad!

PSY 6535 Psychometric Theory Validity - Part 1

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

PSY 6535 Psychometric Theory Validity - Part 1

Transféré par

Droits d'auteur :

Formats disponibles

PSY 6535 Psychometric Theory Validity Part 1

Example: Validity of a Measure

Validity is About Inferences.

Validity for what?

Content (more theory-based)

Criterion-related (more data-based)

Content Validity of a Measure

Example: 1st Grade Math Objectives

Item Pool Which are Content Valid?

Item Pool Which are Content Valid?

Assessing Content Validity

Contamination & Deficiency

Measure Deficiency Relevance (Content Validity)

Criterion-related Evidence for a Measure

Two Main Types of Criterion-Related Validity

Criterion-related validity: Concurrent validity

Criterion-related validity: Predictive validity

Problem: Small Samples = Imprecise Estimates

Problem: Range Restriction

Example: range restriction

General cognitive ability

Example: range restriction

General cognitive ability

Example: range restriction

General cognitive ability

When/where might we find range restriction?

When/where might we find unreliability? Everywhere!

Correction for Attenuation

rxy rxx ryy

Correcting for Measurement Error

Summary Issues Criterion-related Validity

What is our Criterion? How do we measure it?

Vous aimerez peut-être aussi