Interjudge Reliability and Decision Reproducibility
- 1 December 1994
- journal article
- Published by SAGE Publications in Educational and Psychological Measurement
- Vol. 54 (4) , 913-925
- https://doi.org/10.1177/0013164494054004007
Abstract
The purpose of this article is to discuss the importance of decision reproducibility for performance assessments. When decisions from two judges about a student's performance using comparable tasks correlate, decisions have been considered reproducible. However, when judges differ in expectations and tasks differ in difficulty, decisions may not be independent of the particular judges or tasks encountered unless appropriate adjustments for the observable differences are made. In this study, data were analyzed with the Facets model and provided evidence that judges grade differently, whether or not the scores given correlate well. This outcome suggests that adjustments for differences among judge severities should be made before student measures are estimated to produce reproducible decisions for certification, achievement, or promotion.Keywords
This publication has 4 references indexed in Scilit:
- The Measurement of Writing Ability With a Many-Faceted Rasch ModelApplied Measurement in Education, 1992
- Reliability of Ratings for Multiple Judges: Intraclass Correlation and Metric ScalesApplied Psychological Measurement, 1991
- Correcting Performance-Rating Errors in Oral ExaminationsEvaluation & the Health Professions, 1991
- Measuring the Impact of Judge Severity on Examination ScoresApplied Measurement in Education, 1990