Académique Documents
Professionnel Documents
Culture Documents
Learning
EVALUATION
Assessing and Evaluating
Learning
Outline:
Introduction
Definition of assessment and evaluation
Aim of student evaluation
Steps in student evaluation
The basic principles of assessment/ evaluation
Regulation of learning by the teacher
Types of evaluation
Qualities of a test
Characteristics of measurement instrument
Advantages and disadvantages of different types of tests
Introduction
Definition of evaluation:
Evaluation is the process of analyzing,
reflecting upon, and summarizing
assessment information, and making
judgments and/or decisions based on the
information collected.
Aim of student evaluation
Incentive to learn
Feedback to student
Modification of learning activities
Selection of students
Success or failure
Feedback to teacher
Protection of society
Types of evaluation
1- Formative evaluations:
It is an ongoing classroom
process that keeps students and
educators informed of students
progress toward program learning
objectives.
validity coefficient
Types of Validity
Content Validity. Content validity means
the extent to which the content or topic of
the test is truly representative of the course.
It involves, essentially, the systematic
examination of the test content to determine
if it covers a representative sample of the
behaviour domain to be measured. It is
very important the behaviour domain to be
tested must be systematically analysed to
make certain that all major aspects are
covered by the test items and in correct
CONTENT VALIDITY
Content validity is described by the
relevance of a test to different types of
criteria, such as thorough judgment and
systematic examination of relevant course
syllabi and textbooks, pooled judgment of
subject matter expert, statement of
behavioural objectives, analysis of teacher-
made test questions, and among others.
Thus content validity depends on the
relevance of the individuals responses to
the behavior are under consideration rather
Content validity
Content validity is commonly used in
evaluating achievement test. A well-
constructed achievement test should cover
the objective of instruction, not just its
subject matter. The Taxonomy of
educational Objectives by Bloom would be
of great help in listing the objectives to be
covered in an achievement test.
Content validity is particularly appropriate
for the criterion referenced measure. It
CONTENT VALIDITY
CONSTRUCT
Abstract concepts such as intelligence,
self-concept, motivation, aggression and
creativity that can be observed by some
type of instrument.
ILLUSTRATION
For example, a teacher wishes to establish
the validity of an IQ using the Culture fair
Intelligence Test. He hypothesizes that
students with high IQ also have high
achievement and those with low IQ, low
achievement. He therefore administers
both Culture Fair Intelligence Test and
achievement test to groups of students with
high IQ have high scores in the
achievement test and those with low IQ
have low scores in achievement test, the
A tests construct
validity is often
assessed by its
convergent and
discriminant validity.
FACTORS AFFECTING
VALIDITY
1. Test-related factors
2. The criterion to which you
compare your instrument may
not be well enough established
3. Intervening events
4. Reliability
RELIABILITY
Reliability means the extent to which a test
is dependable, self-consistent and stable.
In other words, the test agrees with itself. It
is concern with the consistency of
responses from moment to moment. Even
if a person takes the same test twice, the
test yields the same results. How ever a
reliable test may not always be valid.
RELIABILITY
The consistency of measurements
A RELIABLE TEST
Produces similar scores across various
conditions and situations, including
different evaluators and testing
environments.
How do we account for an individual
who does not get exactly the same
test score every time he or she takes
the test?
1. Test-takers temporary psychological or
physical state
2. Environmental factors
3. Test form
4. Multiple raters
RELIABILITY COEFFICIENTS
The statistic for expressing
reliability.
Expresses the degree of
consistency in the measurement
of test scores.
Donoted by the letter r with two
identical subscripts (rxx)
RELIABILITY
For instance, Student C took Chemistry test
twice. His anwser in item 5 What is the
neutral ph? is 6.0. In the second
administration of the same test and
question, his answer is still 6.0, thus, his
response is reliable but not valid. His
answer is reliable due to consistency of
responses, 6.0, but not valid due to no
veracity of his answer. The correct answer
is pH 7.0. Hence, a reliable tst may not
always be valid.
METHODS IN TESTING THE RELIABILITY OF
GOOD MEASURING INSTRUMENT
TEST RETEST METHOD. The same
measuring instrument is administered twice to the same
group of students and the correlation coefficient is
determined. The limitations of this method are (1) when
the time interval is short, the respondents may recall their
previous responses and this tends to make the correlation
coefficient high, (2) when the time interval is long, such
factors as unlearning, forgetting, among others may occur
and may result in low correlation of the measuring
instrument, and (3) regardless of the time interval
separating the two administrations, other varying
environmental conditions such as noise, temperature,
lighting, and other factors may affect the correlation
coefficient of the measuring instrument.
TEST-RETEST RELIABILITY
Sometimes referred to as
internal consistency
Indicates that subjects scores
on some trials consistently
match their scores on other
trials
Formula
rwt = 2 ( rht)
1 + rht
Where rwt is the reliability of the whole test;
and rht is the reliability of the half test.
For instance, a test is administered to the students as pilot
sample to test the reliability coefficient of the odd and even
items.
6 1 1 1 1 1 1 1 1 1 1 0 0 0 0 10 .7 .29 .2059
TARGET BEHAVIOR
A specific behavior the observer is
looking to record
Interpretation of Correlation of Coefficient Values
An r from + 0.00 to + 0.20 denotes negligible correlation
An r from + 0.21 to + 0.40 denotes low correlation
An r from + 0.41 to + 0.70 denotes marked or moderate correlation
An r from + 0.71 to + 0.90 denotes high correlation
An r from + 0.91 to + 0.99 denotes very high correlation
An r from + 1.00 denotes perfect correlation
OBTAINED SCORE
The score you get when you administer a test
Consists of two parts: the true score and the
error score
STANDARD ERROR of
MEASUREMENT (SEM)
Gives the margin or error that you should
expect in an individual test score because of
imperfect reliability of the test
Evaluating the Reliability Coefficients
1- Oral examinations:
Advantages
1. Provide direct personal contact with candidates.
2. Provide opportunity to take mitigating circumstances into
account.
3. Provide flexibility in moving from candidate's strong points to
weak areas.
4. Require the candidate to formulate his own replies without cues.
5. Provide opportunity to question the candidate about how he
arrived at an answer.
6. Provide opportunity for simultaneous assessment by two
examiners.
1- Oral examinations
Disadvantages
1. Lack standardization.
2. Lack objectivity and reproducibility of results.
3. Permit favoritism and possible abuse of the
personal contact.
4. Suffer from undue influence of irrelevant factors.
5. Suffer from shortage of trained examiners to
administer the examination.
6. Are excessively costly in terms of professional time
in relation to the limited value of the information it
yields.
2- Practical examinations
Advantages
1. Provide opportunity to test in realistic setting skills
involving all the senses while the examiner observes and
checks performance.
2. Provide opportunity to confront the candidate with
problems he has not met before both in the laboratory and
at the bedside, to test his investigative ability as opposed
to his ability to apply ready-made "recipes".
3. Provide opportunity to observe and test attitudes and
responsiveness to a complex situation (videotape
recording).
4. Provide opportunity to test the ability to communicate
under Pressure, to discriminate between important and
trivial issues, to arrange the data in a final form.
2- Practical examinations
Disadvantages
1. Lack standardized conditions in laboratory
experiments using animals, in surveys in the
community or in bedside examinations with patients of
varying degrees of cooperativeness.
2. Lack objectivity and suffer from intrusion or
irrelevant factors.
3. Are of limited feasibility for large groups.
4. Entail difficulties in arranging for examiners to
observe candidates demonstrating the skills to be
tested.
3- Essay examinations
Advantages
1. Provide candidate with opportunity to demonstrate
his knowledge and his ability to organize ideas and
express them effectively
Disadvantages
1. Limit severely the area of the student's total work
that can be sampled.
2. Lack objectivity.
3. Provide little useful feedback.
4. Take a long time to score
4- Multiple-choice questions
Advantages
1. Ensure objectivity, reliability and validity; preparation of
questions with colleagues provides constructive
criticism.
2. Increase significantly the range and variety of facts that
can be sampled in a given time.
3. Provide precise and unambiguous measurement of the
higher intellectual processes.
4. Provide detailed feedback for both student and teachers.
5. Are easy and rapid to score.
4- Multiple-choice questions
Disadvantages
1. Take a long time to construct in order to
avoid arbitrary and ambiguous questions.
2. Also require careful preparation to avoid
preponderance of questions testing only
recall.
3. Provide cues that do not exist in practice.
4. Are "costly" where number of students is
small.