Interjudge Reliability and Decision Reproducibility

1 December 1994

journal article
Published by SAGE Publications in Educational and Psychological Measurement

Vol. 54 (4) , 913-925
https://doi.org/10.1177/0013164494054004007

Abstract

The purpose of this article is to discuss the importance of decision reproducibility for performance assessments. When decisions from two judges about a student's performance using comparable tasks correlate, decisions have been considered reproducible. However, when judges differ in expectations and tasks differ in difficulty, decisions may not be independent of the particular judges or tasks encountered unless appropriate adjustments for the observable differences are made. In this study, data were analyzed with the Facets model and provided evidence that judges grade differently, whether or not the scores given correlate well. This outcome suggests that adjustments for differences among judge severities should be made before student measures are estimated to produce reproducible decisions for certification, achievement, or promotion.

Keywords

This publication has 4 references indexed in Scilit:

The Measurement of Writing Ability With a Many-Faceted Rasch Model
Applied Measurement in Education, 1992
Reliability of Ratings for Multiple Judges: Intraclass Correlation and Metric Scales
Applied Psychological Measurement, 1991
Correcting Performance-Rating Errors in Oral Examinations
Evaluation & the Health Professions, 1991
Measuring the Impact of Judge Severity on Examination Scores
Applied Measurement in Education, 1990