Standard Setting Procedures

RESEARCH METHODOLOGY
Procedures for Establishing Defensible Absolute Passing Scores on Performance Examinations in Health Professions Education
Steven M. Downing Ara Tekian Rachel Yudkowsky Department of Medical Education University of Illinois at Chicago Chicago, Illinois, USA
Learning Objectives
By the end of IP we should able to Describe standard setting methods Differentiate b/w their types
Norm based Criterion based Relative Absolute
Know Passing Score Describe selection of Judges (examiners) Identify Borderline Examinee Understand each method
What do experts say?

"We have come to realize that there is no objectively correct way to set standards. But we have also come to realize that there is nothing wrong with using judgments appropriately." (Zieky, 1995, p.5) "Determination of a minimum acceptable performance always involves some rather arbitrary and not wholly satisfactory decisions." (Ebel, 1972, p.492)
Why we need standard setting methods?

To determine the standards of performance To separates the Non-competent from the Competent To provide an educational tool to decide CUT OFF POINT on the score scale
(Reference AMEE Guide No. 18: Standard setting in student assessment)
essentials
1. choice of content expert 5-6 or 11- 12 2. identification of borderline examine 3. cut score
Choice of content expert

Judges should be to Judge examinee performance Unbiased Follow the Instructions Understand their task Judges should be subject experts Belong from variable culture, ethics, religion and both genders ( male and female judges) 5 -12 judges panel is better.
Borderline examinee
One who has 50 50 probability of passing or failing the test. Sometimes passes the exam and sometimes fails Or Judges will decide the characteristics of borderline examinee
Cut Score
There is no gold standard for pass scores. Passing score is what ever the judges decide. Different panel of judges may decide different passing score for the same exam. It depends upon how much is enough to pass, the subject experts will decide by devising a check list / predetermined key or left it open on judges judgment.
Problem with judges

Judges expect even more from borderline examinee They set unrealistically high standards which fails a reasonably high proportion of examinee e.g
in viva (Examiners expectations are so high ) It happens when judges decide cut off without knowing the actual performance data.
How can we overcome this problem

Celebration of judges by
Providing student record showing their overall performance.
Student record
Test
Jan Feb Mar April May June Aug Sept Send up
Topics
Thanatology Autopsy and exhumation Asphyxia PI and traumatology
MCQs
Fail Pass Pass Above Above
SAQs
Fail Fail Pass Above pass
OSPE
below Pass Pass Excellent Very good
VIVA
Fail Fail Borderline Pass Good
Passing Rate
0 Not done 1 poor 2 3 Below expectation 4 17 4 borderline 5 6 Meet expectation 35 7 7 Above expectation 3
24
PASSED = 64
no of passed students x 100 total no of students appeared 64/100x100 = 64%
Standards setting Methods

Relative Absolute relative absolute compromise method
Item based
Criterion based
Hofstee
Performance based
Norm based
Relative
Item based
performance
Modified Angoff
Ebel
Original Angoff
Judgments of the judges are combined to determine passing score

Angoff Method
5 .80 .75 .65 .65 .70 .65 .70 .65 .43 .65 6 .95 .85 .60 .70 .85 .85 .90 .80 .55 .70
Item 1 2 3 4 5 6 7 8 9 10
7 M .85 0.86 .75 0.78 .60 0.59 .70 0.69 .80 0.78 .80 0.73 .60 0.63 .70 0.73 .45 0.48 .70 0.66 Sum 6.93 Pass Score is 69.30% Raw Passing Score = Sum of item means = 6.93. Percent Passing Score = 100% (sum of item means/number of items) = 100% (6.93/10) = 69.30%.
1 .80 .70 .50 .70 .75 .60 .50 .70 .45 .60
2 .87 .75 .63 .68 .70 .65 .58 .78 .50 .69
Rater 3 4 .85 .90 .80 .85 .55 .60 .70 .70 .80 .85 .80 .75 .55 .60 .75 .75 .50 .45 .65 .65
Angoff Passing Score
Angoffs method - 2
Read the first item Estimate the proportion of the borderline group that would respond correctly Record ratings, discuss, and change Repeat this for each item Calculate the passing score by
adding rating score of each item separately (modified angoffs) e.g FCPS examinee has to satisfy all judges Adding performance of all stations (original angoff) OSCE
Ebels Method
Judges define the check list and rating scale Categorize items like essential, important, acceptable Rate item like easy medium hard judges define the borderline performance to pass (0 100 %)
Ebels method
Easy Medium Hard
Essential
Important
Acceptable
Ebels Method
Judges make judgments about the percentages of items in each category that borderline test-takers would have answered correctly Calculate passing score
Ebels method
%(borderline perform correctly)
Easy
Medium
Hard
Essential
95%
60%
40%
Important Acceptable
90%
56%
34%
80%
60%
50%
Items Relevance
Easy Item # % correct
Medium Item # % correct
Hard Item # % correct
Weighted Mean
Essential
4 , 5
93
81
63
2(.93)+.81+.63= 3.30
.89+.76+.59= 2.24
Important
89
10
76
59
acceptable N/A
N/A
62
6 , 8
42
.62+2(.42)= 1.46
T. Mean
3.30 + 2.24 + 1.46 =
Passing rate = Total mean x 100 / no of items = 7 x 100 / 10 = 70% % correct is the mean judgment of all the judges , borderline examinee did correct.
Absolute
Criterion based Norm based
Criterion referenced methods : Based on how much the examinees know Candidates pass or fail depending on whether they meet specified criteria In Criterion-referenced tests (or CRTs) performance of each examinee is compared to a pre-defined set of criteria or a standard. The goal with these tests is to determine whether or not the candidate has the demonstrated mastery of a certain skill or set of skills. E.g . A national board medical exam is an example of a CRT. Either the examinee has the skills to practice the profession, in which case he or she is licensed, or does not. e.g. examinees must correctly answer 70% of the questions
REF : NORM-REFERENCED VS. CRITERION-REFERENCED TESTING May 22nd, 2008 by Danielle, Director of Sales and Marketing, Language Testing
Criterion referenced standard

Test score distribution (average group)
Test score distribution (poor group) Test score distribution (good group)
50 %
criterion based
Based upon already set criteria e.g 33% passing score in FA,BA exams 50% passing score in MBBS exams 60% passing score at post graduation level 80% passing score in skilled exams.
Criterion based
borderline
contrast
Contrasting Groups
Performance is judged by check list or rating scale. Students are divided into expert and nonexpert groups based on rating scale Graphical presentation . Passing score is set at the insertion of two distributions false positive and false negative.
Compromise Methods
Advantages
Easy to implement Educators are comfortable with the decisions
Disadvantages
The cut score may not be in the area defined by the judges estimates The method is not the first choice in a high stakes testing situation
Borderline Group
Examinee centered. Performance of the examine is judge overall. Faculty directly observe the performance E.g OSCE Each judge observe multiple examinee on same station. Judges use global rating scale 1= fail, 2= borderline, 3= pass The mean checklist score of borderline examinee becomes the passing score.
Types of Standards
Norm referenced methods (NTR ) :
Based on a comparison among the performances of examinees Or Compare examinee performance to that of other examinees. Standardized examinations such as the SAT are norm-referenced tests. The goal is to rank the set of examinees so that decisions about their opportunity for success (e.g. college entrance) can be made. e.g Normal distribution bells curve. A set proportion of candidates fails regardless of how well they perform e.g. the top 84% pass
Norm based:
Cut off score is not pre defined
Identify a group of passing and failing examinee by comparing their performance In OSCE there are 10 stations total Score 100, No of examinee is 05
examinee 1
score 100 50
2
70
3
90
4
80
5
30
Mean = 320/5x = 64 64 is cut off b/w pass and fail (REF: Medical Knowledge Using Progress Tests A.M.M. Muijtjens, R.J.I. Hoogenboom, G.M. Verwijnen, C.P.M. van der Vleuten)
Norm-referenced standard
Test score distribution
30 %
50 % 80 %
Hofstee Method (relative absolute compromise method)

Judges are ask to define minimum and maximum passing score and failure rate .e.g 81 -100 % outstanding 71 80 % above expectation 61 70 % (max pass score) meet expectation 56 60 % top borderline 51 55 % (min pass score) bottom borderline 40 50 % below expectation 20 39 % perform incorrect 0 - 19 % dont know
Hofstee Method
Graphical presentation Judges predefined
Fail rate e.g min 6, max 20 students to fail. Acceptable
Lowest pass score % Highest pass score % Min/max J 1 J- 2 J- 3 pass score Min 62 57 51 max 72 67 73
J -4 55 65
J -5 52 60
J -6 59 71
mean 56 68
Hofstee Graph
Min Max pass %
56 %
Cumulative %
68%
Actual score
Max fail rate 20 %
61%
Min fail rate 06 %
Scores
Compensatory Vs Non compensatory

Compensatory Poor performance on one station can be compensated by good performance on other stations. Overall score will be the avg of performance on all the station. E.g SAQs, MMI, OSCE Non compensatory Student should reach the minimum level of competence on each station. Student has to meet a predefined criteria on each station to pass. E.g OSATS, DOPS, Mini CEX
comparison
Judgment focused on Judgment require performance data No Direct observation Timing of judgememt
Angoff
Test items / Performance
No
Before exam
Ebel
Hofstee Border line Contrast
Test items
Whole test Examinee performance Examinee performance
Yes
Yes No No
No
No Yes Yes
After exam
After exam During exam During exam
Summary :1. All standard-setting is judgmental 2. Standard-setting leads to errors of classification
3. Standard-setting is and will remain controversial

4. There is no purely absolute standard.
5. There is no one right method

6. Choosing judges is more important than choosing methods 7.
Summery .
If the expert use rating scale or check list for assessment then you can choose borderline or contrasting method. If you dont have expert rating the exam then you can choose Angoff, eble or Hofstee method
Critique
This article describes only the standard settings for performance based exam ie OSCE, OSATS, DOPS Classification of standards is some what confusing. Standards are overlapping no clear demarcation These methods can be applied with some modifications. Dose not discuss percentile method.
References
AMEE guide No. 18 Berk, R.A. (1986). A consumer's guide to setting performance standards on criterion-referenced tests. Review of Educational Research, 56, 137-172. Cizek, G. J. (2001). Setting Performance Standards: Concepts, Methods, and Perspectives. Mahwah, NJ: Lawrence Erlbaum Associates. Jaeger, R.M. (1989). Certification of student competence. In R.L. Linn (Ed.), Educational Measurement. New York: American Council on Education and Macmillan Publishing Company. Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64, 425-461. Livingston, S.A. and Zeiky, M.J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.
References
Norcini, J.J. and Guille, R.A. (2002). Combining tests and setting standards. In Norman, G., van der Vleuten, C., and Newble, D. (Eds.): International Handbook of Research in Medical Education (pp. 811-834). Dordrecht: Kluwer Press. Norcini, J. J. (2003). Setting standards on educational tests. Medical Education, 37, 464-469.
Norcini, J. J. & Shea, J. A. (1997). The credibility and comparability of standards. Applied Measurement in Education, 10, 39-59.
Zeiky, M. J. (2001). So much has changed. How the setting of cutscores has evolved since the 1980s. In G.J.Cizek (Ed.), Setting Performance Standards: Concepts, Methods, and Perspectives (pp. 19-52). Mahwah, NJ: Lawrence Erlbaum Associates.

Standard Setting Procedures

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Standard Setting Procedures

Transféré par

Droits d'auteur :

Formats disponibles

RESEARCH METHODOLOGY

What do experts say?

Why we need standard setting methods?

Choice of content expert

Problem with judges

How can we overcome this problem

no of passed students x 100 total no of students appeared 64/100x100 = 64%

Standards setting Methods

Judgments of the judges are combined to determine passing score

Angoff Passing Score

Easy Item # % correct

Medium Item # % correct

Hard Item # % correct

3.30 + 2.24 + 1.46 =

Criterion referenced standard

Hofstee Method (relative absolute compromise method)

Max fail rate 20 %

Min fail rate 06 %

Compensatory Vs Non compensatory

Test items / Performance

Summary :1. All standard-setting is judgmental 2. Standard-setting leads to errors of classification

3. Standard-setting is and will remain controversial

5. There is no one right method

Vous aimerez peut-être aussi