Vous êtes sur la page 1sur 15

Measurement Error

Whatever measurement we might make with regard to

some psychological construct, we do so with some amount


of error
Any observed score for an individual is their true score with error

added in

There are different types of error, but here we are

concerned with a measures inability to capture the true


response for an individual
Observed Score = True score + Error of measurement

Reliability
Reliability refers to a measures ability to capture an individuals true

score, i.e. to distinguish accurately one person from another


While a reliable measure will be consistent, consistency can actually be
seen as a by-product of reliability, and in a case where we had perfect
consistency (everyone scores the same and gets the same score
repeatedly), reliability coefficients could not be calculated
No variance/covariance to give a correlation

The error in our analyses is due to individual differences but also the

lack of the measure being perfectly reliable

Reliability
Criteria of reliability
Test-retest
Test components (internal consistency)

Test-retest reliability
Consistency of measurement for individuals over time
The score similarly e.g. today and 6 months from now

Issues
Memory
If too close in time the correlation between scores is due to memory of item responses rather
than true score captured
Chance covariation
Any two variables will always have a non-zero correlation
Reliability is not constant across subsets of a population
General IQ scores good reliability
IQ scores for college students, less reliable

Restriction of range, fewer individual differences

Internal Consistency
We can get a sort of average correlation among items

to assess the reliability of some measure1


As one would most likely intuitively assume, having
more measures of something is better than few
It is the case that having more items which correlate
with one another will increase the tests reliability

Whats good reliability?


While we have conventions, it really kind of depends
As mentioned reliability of a measure may be different for

different groups of people


What we may need to do is compare reliability to those
measures which are in place and deemed good as well as
get interval estimates to provide an assessment of the
uncertainty in our reliability estimate
Note also that reliability estimates are biased upwardly and
so are a bit optimistic
Also, many of our techniques do not take into account the
reliability of our measures, and poor reliability can result in
lower statistical power i.e. an increase in type II error
Though technically increasing reliability can potentially also lower

power1

Replication and Reliability


While reliability implies replicability, assessing reliability does not provide a

probability of replication

Note also that statistical significance is not a measure of reliability or replicability1

Replication is not perhaps conducted as much as should be in psychology for a


number of reasons
Practical concerns, lack of publishing outlets etc.

Furthermore, knowing our estimates are biased and variable themselves, we

might even think that in many cases we would not expect consistent research
findings
In psychology, many people spend a lot of time debating back and forth about
the merits of some theory, citing cases where it did or did not replicate
However the lack of replication could be due to low power, low reliability,
problem data, incorrectly carrying out the experiment etc.
In other words, we didnt repeat because of methodology, not because the theory was

wrong

Factors affecting the utility of replications


You cant step in the same river twice!
Heraclitus1

When
Later replications are not providing as much information, however

they can contribute greatly to the overall assessment of an effect

Meta-analysis

How
There is no perfect replication (different people involved, time it

takes to conduct etc.)


Doing exact replication gives us more confidence in the original
finding (should it hold), but may not offer much in the way of
generalization

Example: doing a gender difference study at UNT over and over. Does it
work for non-college folk? People outside of Texas?

Factors affecting the utility of replications


By whom
It is well known that those with a vested interest in some idea tend

to find confirming evidence more than those that dont


Replications by others are still being done by those with an interest
in that research topic and so may have a precorrelation inherent in
their attempt

Direct: correlation of attributes of persons involved


Indirect: correlation of data to be obtained

Gist, we cant have truly independent replication attempts,

but must strive to minimize bias


The more independent replication attempts are, the more
informative they will be

Validity
Validity refers to the question of whether our

measurements are actually hitting on the construct we


think they are
While we can obtain specific statistics for reliability (even
different types), validity is more of a global assessment
based on the evidence available
We can have reliable measurements that are invalid
Classic example: The scale which is consistent and able to

distinguish from one person to the next but actually off by 5 pounds

Validity Criteria in Psychological Testing


Content validity
Criterion validity
Concurrent
Predictive

Construct-related validity
Convergent
Discriminant

Content validity
Items represent the kinds of material (or content areas) they are supposed to
represent
Are the questions worth a flip in the sense they cover all domains of a given

construct?

E.g. job satisfaction = salary, relationship w/ boss, relationship w/ coworkers etc.

Validity Criteria in Psychological Testing


Criterion validity
the degree to which the measure correlates with various outcomes

Does some new personality measure correlate with the Big 5

Concurrent
Criterion is in the present

Measure of ADHD and current scholastic behavioral problems

Predictive
Criterion in the future

SAT and college gpa

Validity Criteria in Psychological Testing


Construct-related validity
How much is it an actual measure of the construct of interest

Convergent
Correlates well with other measures of the construct

Depression scale correlates well with other dep scales

Discriminant
Is distinguished from related but distinct constructs

Dep scale != Stress scale

Validity Criteria in Experimentation


Statistical conclusion validity
Is there a causal relationship between X and Y?
Correlation is our starting point (i.e. correlation isnt causation, but does lead to it)
Related to this is the question of whether the study was sufficiently sensitive to pick

up on the correlation

Internal validity
Has the study been conducted so as to rule out other effects which were controllable?
Poor instruments, experimenter bias

External validity
Will the relationship be seen in other settings?

Construct validity
Same concerns as before
Ex. Is reaction time an appropriate measure of learning?

Summary
Reliability and Validity are key concerns in psychological

research
Part of the problem in psychology is the lack of reliable
measures of the things we are interested in1
Assuming that they are valid to begin with, we must always
press for more reliable measures if we are to progress
scientifically
This means letting go of supposed standards when they are no

longer as useful and look for ways to improve current ones

Vous aimerez peut-être aussi