Correcting Performance-Rating Errors in Oral Examinations

1 March 1991

journal article
Published by SAGE Publications in Evaluation & the Health Professions

Vol. 14 (1) , 100-122
https://doi.org/10.1177/016327879101400107

Abstract

Although oral examinations are widely used for making decisions regarding an individual s level of competence, they are frequently of limited reliability. A significant part of the error in oral performance ratings is due to the tendency for some evaluators to be lenient and others to be stringent in their assignment of ratings. This article describes and evaluates a simple method to identify and correct for errors of leniency and stringency. The method, which is based on a regression model recommended by Wilson (1988), extends and simplifies the procedures recommended by Cason and Cason (1984, 1985). The method provides an estimate of each individual's performance that has been corrected for errors of leniency and stringency. In addition, it produces for each rater an index of leniency or stringency and several other statistics useful in evaluating the properties of rating data. The regression method is applied to performance ratings from three separate administrations of an oral examination in a medical specialty. The results indicate modest but significant levels of leniency and stringency error; correcting for such errors would change the pass/fail decisions for about 6% of the examinees. Limitations of the procedure, as well as the need for additional research, ore discussed.

Keywords

This publication has 10 references indexed in Scilit:

Assessment of clinical skills with standardized patients: State of the art
Teaching and Learning in Medicine, 1990
Parameter Estimation for Peer Grading under Incomplete Design
Educational and Psychological Measurement, 1988
Missing Data in Evaluation Research
Evaluation & the Health Professions, 1986
A Deterministic Theory of Clinical Performance Rating
Evaluation & the Health Professions, 1984
Two Simple Models for Rater Effects
Applied Psychological Measurement, 1984
Balanced Incomplete Block Designs for Inter-Rater Reliability Studies
Applied Psychological Measurement, 1981
Effects of rater training: Creating new response sets and decreasing accuracy.
Journal of Applied Psychology, 1980
Effects of rater training on leniency and halo errors in student ratings of instructors.
Journal of Applied Psychology, 1978
THE VALIDITY AND RELIABILITY OF ORAL EXAMINATIONS IN ASSESSING COGNITIVE SKILLS IN MEDICINE¹
Journal of Educational Measurement, 1970
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960