Académique Documents
Professionnel Documents
Culture Documents
Validity of an instrument has to do with the ability to measure what it is supposed to measure and the extent to which it predicts outcomes.
- Face Validity Construct & Content Validity - Convergent & Divergent Validity - Predictive Validity - Discriminant Validity -
Reliability
Reliability is synonymous with consistency. It is the degree to which test scores for a an individual test taker or group of test takers are consistent over repeated applications. No psychological test is completely consistent, however, a measurement that is unreliable is worthless. For Example A student receives a score of 100 on one intelligence tests and 114 in another or imagine that every time you stepped on a scale it showed a different weight. Would you keep using these measurement tools? The consistency of test scores is critically important in determining whether a test can provide good measurement.
Reliability (cont.)
Because no unit of measurement is exact, any time you measure something (observed score), you are really measuring two things
Measurement Error
Any fluctuation in test scores that results from factors related to the measurement process that are irrelevant to what is being measured. The difference between the observed score and the true score is called the error score. S true = S observed - S error Developing better tests with less random measurement error is better than simply documenting the amount of error. Measurement Error is Reduced By: - Writing items clearly - Making instructions easily understood - Adhering to proper test administration - Providing consistent scoring
Determining Reliability
There are several ways that a measurements reliability can be determined, depending on the type of measurement the and the supporting data required. They include: - Internal Consistency - Test-retest Reliability - Interrater Reliability - Split-half Methods - Odd-even Reliability - Alternate Forms Methods
Internal Consistency
Measures the reliability of a test solely on the number of items on the test and the intercorrelation among the items. Therefore, it compares each item to every other item. If a scale is measuring a construct, then overall the items on that scale should be highly correlated with one another.
There are two common ways of measuring internal consistency 1. Cronbachs Alpha: .80 to .95 (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect) 2. Item-Total Correlations - the correlation of the item with the remainder of the items (.30 is the minimum acceptable item-total correlation).
Average Intercorrelation - the extent to which each item represents the observation of the same thing observed.
= Reliability
Possible Disadvantages Many was of splitting Each split yields a somewhat different reliability estimate Which is the real reliability of the test?
Test-retest Reliability
Test-retest reliability is usually measured by computing the correlation coefficient between scores of two administrations.
Possible difference in scores between tests? : experience, characteristic being measured may change over time (e.g. reading test), carryover effects (e.g., remember test)
The higher the correlation (in a positive direction) the higher the test-retest reliability The biggest problem with this type of reliability is what called memory effect. Which means that a respondent may recall the answers from the original test, therefore inflating the reliability.
Also, is it practical?
Interrater Reliability
Whenever you use humans as a part of your measurement procedure, you have to worry about whether the results you get are reliable or consistent. People are notorious for their inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We daydream. We misinterpret.
Both forms are administered to the same person and the scores are correlated. If the two produce the same results, then the instrument is considered reliable.
Advantages
Eliminates the problem of memory effect. Reactivity effects (i.e., experience of taking the test) are also partially controlled. Can address a wider array of sampling of the entire domain than the test-retest method.
Administrator Factors
Poor or unclear directions given during administration or inaccurate scoring can affect reliability. For Example - say you were told that your scores on being social determined your promotion. The result is more likely to be what you think they want than what your behavior is.
Heterogeneity
Heterogeneity of the Items -- The greater the heterogeneity (differences in the kind of questions or difficulty of the question) of the items, the greater the chance for high reliability correlation coefficients. Heterogeneity of the Group Members -- The greater the heterogeneity of the group members in the preferences, skills or behaviors being tested, the greater the chance for high reliability correlation coefficients.
Experience happens, and it influences how we see things. Because internal consistency has no time lapse, one can expect it to have the highest reliability correlation coefficient.
lower reliability.
.80 > greater (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect)
Generalizability Theory
Theory of measurement that attempts to determine the sources of consistency and inconsistency
Allows for the evaluation of interaction effects from different types of error sources.
It is necessary to obtain multiple observations for the sample group of individuals on all the variables that might contribute to causing measurement error (e.g., scores across occasions, across scorers, across alternative forms).
Allows for the evaluation of interaction effects from different types of error sources.
For Example - measurement involving subjectivity (e.g., interviews, rating scales) involve bias. Therefore, human judgement could be considered conditions of measurement If feasible, it is a more thorough procedure for identifying the error component that may enter scores.