Psy 407 Reliability

Evaluation of Measurement Instruments
Reliability has to do with the consistency of the instrument. - Internal Consistency (Consistency of the items)
- Test-retest Reliability (Consistency over time) - Interrater Reliability (Consistency between raters) - Split-half Methods
Correlation- a measure of the association between items/variables Correlations are measured by a numerical value from 0 (no correlation) to 1( perfect or strong correlation)
A) Strength
Correlations can be weak (0) or strong (1)

A) Positive--- the variables go in the same direction (as one increases the other increases or as one decreases the other decreases B) Negative ---they go in opposite directions (as one increases the other decreases)
B) Direction
Alcohol consumption and reaction time (Positive or negative?????) Correlation does not imply causation!!! Breastfeeding and academic development (positive correlation) Rap music and violent behavior (positive correlation)
Reliability
Reliability is synonymous with consistency. It is the degree to which test scores for a an individual test taker or group of test takers are consistent over repeated applications. No psychological test is completely consistent, however, a measurement that is unreliable is worthless. For Example A student receives a score of 100 on one intelligence tests and 114 in another or imagine that every time you stepped on a scale it showed a different weight. Would you keep using these measurement tools? The consistency of test scores is critically important in determining whether a test can provide good measurement.
Reliability (cont.)
Because no unit of measurement is exact, any time you measure something (observed score), you are really measuring two things
1. True Score - the amount of observed score that truly represents

what you are intending to measure. 2. Error Component - the amount of other variables that can impact the observed score Observed Test Score = True Score + Errors of Measurement For Example Personality Scores from the MMPI may reflect your true personality and: a) your mood that day; b) what you ate that morning; c) the actions of the tester; and d) bias in the test itself
Why Do Test Scores Vary?

Possible Sources of Variability of Scores (pg. 110)
- General Ability to comprehend instructions
- Stable response sets (e.g., answering C option more frequently) - The element of chance of getting a question right - Conditions of testing - Unreliability or bias in grading or rating performance - Motivation - Emotional Strain
Measurement Error
Any fluctuation in test scores that results from factors related to the measurement process that are irrelevant to what is being measured. The difference between the observed score and the true score is called the error score. S true = S observed - S error Developing better tests with less random measurement error is better than simply documenting the amount of error. Measurement Error is Reduced By: - Writing items clearly - Making instructions easily understood - Adhering to proper test administration - Providing consistent scoring
Determining Reliability
There are several ways that a measurements reliability can be determined, depending on the type of measurement the and the supporting data required. They include: - Internal Consistency - Test-retest Reliability - Interrater Reliability - Split-half Methods - Odd-even Reliability - Alternate Forms Methods
Internal Consistency
Measures the reliability of a test solely on the number of items on the test and the intercorrelation among the items. Therefore, it compares each item to every other item. If a scale is measuring a construct, then overall the items on that scale should be highly correlated with one another.
There are two common ways of measuring internal consistency 1. Cronbachs Alpha: .80 to .95 (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect) 2. Item-Total Correlations - the correlation of the item with the remainder of the items (.30 is the minimum acceptable item-total correlation).
Split Half & Odd-Even Reliability

Split Half - refers to determining a correlation between the first
half of the measurement and the second half of the measurement (i.e., we would expect answers to the first half to be similar to the second half).
Odd-Even - refers to the correlation between even items and odd

items of a measurement tool. In this sense, we are using a single test to create two tests, eliminating the need for additional items and multiple administrations. Since in both of these types only 1 administration is needed and the groups are determined by the internal components of the test, it is referred to as an internal consistency measure.
Split Half & Odd-Even Reliability

Possible Advantages Simplest method - easy to perform Time and Cost Effective
Possible Disadvantages Many was of splitting Each split yields a somewhat different reliability estimate Which is the real reliability of the test?
Test-retest Reliability
Test-retest reliability is usually measured by computing the correlation coefficient between scores of two administrations.
Test-retest Reliability (cont.)

The amount of time allowed between measures is critical. The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time. Optimum time betweem administrations is 2 to 4 weeks. If a scale is measuring a construct consistently, then there should not be radical changes on the scores between administrations --unless something significant happened. The rationale behind this method is that the difference between the scores of the test and the retest should be due to measurement solely.

It is hard to specify one acceptable test-retest correlation since what is considered acceptable depends on the the type of scale, the use of the scale, and the time between testing. For example - it is not clear whether differences in test scores are regarded as sources of measurement error or as sources of real stability.
Possible difference in scores between tests? : experience, characteristic being measured may change over time (e.g. reading test), carryover effects (e.g., remember test)

A minimum correlation of at least .50 is expected.
The higher the correlation (in a positive direction) the higher the test-retest reliability The biggest problem with this type of reliability is what called memory effect. Which means that a respondent may recall the answers from the original test, therefore inflating the reliability.
Also, is it practical?
Interrater Reliability
Whenever you use humans as a part of your measurement procedure, you have to worry about whether the results you get are reliable or consistent. People are notorious for their inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We daydream. We misinterpret.
Interrater Reliability (cont.)

For some scales it is important to assess interrater reliability. Interrater reliability means that if two different raters scored the scale using the scoring rules, they should attain the same result. Interrater reliability is usually measured by computing the correlation coefficient between the scores of two raters for the set of respondents. Here the criterion of acceptability is pretty high (e.g., a correlation of at least .9), but what is considered acceptable will vary from situation to situation.
Factors Affecting Reliability

Administrator Factors Number of Items on the instrument The Instrument Taker Heterogeneity of the Items Heterogeneity of the Group Members Length of Time between Test and Retest
Administrator Factors
Poor or unclear directions given during administration or inaccurate scoring can affect reliability. For Example - say you were told that your scores on being social determined your promotion. The result is more likely to be what you think they want than what your behavior is.
Number of Items on the Instrument

The larger the number of items, the greater the chance for high reliability. For Example -it makes sense when you ponder that twenty questions on your leadership style is more likely to get a consistent result than four questions. Remedy: Use longer tests or accumulate scores from short tests.
The Test Taker

For Example -If you took an instrument in August when you had a terrible flu and then in December when you were feeling quite good, we might see a difference in your response consistency. If you were under considerable stress of some sort or if you were interrupted while answering the instrument questions, you might give different responses.
Heterogeneity
Heterogeneity of the Items -- The greater the heterogeneity (differences in the ways that the same issue is assessed) of the items, the greater the chance for high reliability correlation coefficients. ****You ask the same question in different ways *****Clients cannot determine what you are trying to assess and fake answers
Length of Time between Test and Retest

The shorter the time, the greater the chance for high reliability correlation coefficients. As we have experiences, we tend to adjust our views a little from time to time. Therefore, the time interval between the first time we took an instrument and the second time is really an "experience" interval.
Experience happens, and it influences how we see things. Because internal consistency has no time lapse, one can expect it to have the highest reliability correlation coefficient.
How High Should Reliability Be?

A highly reliable test is always preferable to a test with
lower reliability.
.80 > greater (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect)
A reliability coefficient of .80 indicates that only 20% of
the variability in test scores is due to measurement error.
Is there a trait for kindness? Aggression? Are we simply a sum total of our environment and our experiences?

Psy 407 Reliability

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Psy 407 Reliability

Transféré par

Droits d'auteur :

Formats disponibles

Evaluation of Measurement Instruments

Correlations can be weak (0) or strong (1)

1. True Score - the amount of observed score that truly represents

Why Do Test Scores Vary?

Split Half & Odd-Even Reliability

Odd-Even - refers to the correlation between even items and odd

Split Half & Odd-Even Reliability

Test-retest Reliability (cont.)

Test-retest Reliability (cont.)

Test-retest Reliability (cont.)

Interrater Reliability (cont.)

Factors Affecting Reliability

Number of Items on the Instrument

The Test Taker

Length of Time between Test and Retest

How High Should Reliability Be?

A reliability coefficient of .80 indicates that only 20% of

the variability in test scores is due to measurement error.

Vous aimerez peut-être aussi