Académique Documents
Professionnel Documents
Culture Documents
Farah Bahrouni/Mr
bahrouni@squ.edu.om
Plan
Briefing about MFRM Run the analysis for 5 facets: candidate, rater, background , experience & category Adjusting scores as per FACETS estimates Conclusion
Student 1 Total: 100 TA:25 CC:25 LR:25 GR:25 Mean 19.62132 Mean 19.38971 Mean 18.20956 Mean 16.45588 Max 25 Max 24 Max 23 Max 22 94 Min 14 Min 13 Min 14 Min 10 51 Range 11 Range 11 Range 9 Range 12 43 Count 68 Count 68 Count 68 Count 68
Student 2 Mean 20.13971 Mean 20.09926 Mean 19.88235 Max 25 Max 25 Max 25 Min 14 Min 13 Min 12 Range 11 Range 12 Range 13 Count 68 Count 68 Count 68
Student 3 Mean 15.16544 Mean 15.79559 Mean 15.48162 Max 25 Max 23 Max 20 Min 10 Min 10 Min 8 Range 15 Range 13 Range 12 Count 68 Count 68 Count 68
Mean Max Min Range Count Mean Max Min Range Count
18.88971 24 11 13 68 18.88971 24 11 13 68
99 50 49
92 39 53
Assessment of language proficiency: Speaking/Writing subjectivity a number of distinct factors directly or indirectly impinge upon the assessment/measurement outcomes.
Any factor, variable, or component [e.g. examinees, tasks, raters, interviewers, etc] of the measurement situation that is assumed to affect test scores in a systematic way.
(Backman, 2004; Linacre, 2002; Wolfe & Dobria, 2008, cited in Eckes,
2009: 2)
The usual approaches to deal with rater variability include: rater training using 2 or more raters in the scoring of performance assessment call for an adjucator (3rd/4th.. rater, usu. > exp./senior/expert..) developing rubrics that spell out the proficiency levels identifying anchor papers to provide concrete examples of each proficiency level
(for details see Johnson, et al. 2005, 2003, 2001, 2000)
Nevertheless, research has found that try as they may, none of these methods is effective enough to guarantee reliable objective scores. They are diverse enough to raise questions about the quality of the resolved scores.
Underlying these resolution models is the common assumption that the discrepant scores might lack the requisite levels of reliability and validity, and that adjudication might improve this deficit to some extent (Johnson, et al. 2005 :123).
As for rater training, it has been found that even with proper training, substantial differences between raters persist.
(Linacre, 1990; Hamp-Lyons, 1991; Weigle, 1994, 1998, 2002; Lumley & McNamara , 1995; McNamara, 1996; Lumley 2005)
Raters differences are reduced by training, but do persist. (McNamara, 1996: 118 ) Reason:
Some see severity much as a personality trait that is inherently brought to any rating situation.
(Myford, et all. 2003)
Farah Bahrouni/LC Conf./April 20, 2011 9
Multi-facet Rasch Model (MFRM) provides a rich set of highly flexible tools to account, and compensate, for measurement error, especially rater-dependent measurement error.
It is an extension of the basic Rasch model that incorporates more facets than the 2 usally included in dichotomous item tests, i.e. candidates and items.
10
Multifaceted Rasch measurement is a stochastic model performed using FACETS, a computer program developed by Linacre (1989).
Candidate ability is estimated from all ratings given by all raters on all items (Lunz & Wright, 1997; McNamara, 1996: 132).
Item difficulty (TA,CC,LR & GA) is estimated from all responses across all candidates to that item (ibid).
Rater severity is estimated from all ratings given across all candidates and items (ibid).
Farah Bahrouni/LC Conf./April 20, 2011 11
Fit analysis
Bias analysis
how the individual elements within the facets interact: individual-level effects of the various elements: (bias analysis: z score values between +2 & -2 )
Thus, source(s) of variation in the scores are efficiently determined.
(Myford, et al. 2003; Lunz & Wright, 1997)
12
Conclusion
Owing to the above features, MFRM has been found a model with a great potential to improve our capacity to produce objective measures of the ability of test takers in performance assessment contexts. It is practical and can be used in our context along with the pair rating.
(Linacre, et al. 1990; Engelhard, 1991, 1992, 1994, 1996; Engelhard & Myford, 2003; Hamp-Lyons, 1991; Lunz 1996, 1997a, 1997b; Lunz & Wright 1997, Weigle, 1994, 1998, 2002; Schaefer 2003, 2008; Kondo-Brown 2002; Lumley & McNamara 1995, Lumley 2005; McNamara 1991, 1996, 1997, 2000, 2002, 2008; McNamara & Roever, 2006; Myford et al, 2003, 2004; Shaw & Weir 2007; Wigglesworth, 1993, 1994).
13