Sources of unreliability and bias in standardized‐patient rating
- 1 January 1991
- journal article
- research article
- Published by Taylor & Francis in Teaching and Learning in Medicine
- Vol. 3 (2) , 74-85
- https://doi.org/10.1080/10401339109539486
Abstract
In tests of clinical competence, standardized patients (SPs) can be used to present the clinical problem and rate actions taken by the examinee in the patient encounter. Both these aspects of the “test”; have the potential to contribute to unreliability and bias in measurement. In 1987, two universities collaborated to develop and execute the same SP test to clinical clerks in their respective institutions. This provided us with the opportunity to evaluate rating bias attributable to test site and three sources of rating unreliability within the same population of raters: those attributable to inconsistencies within the same rater (within‐rater reliability), those attributable to inconsistencies between two raters trained in the same test site (between‐raters reliability—same site), and those attributable to inconsistencies between two raters trained in different test sites (between‐raters reliability—different sites). A stratified random sample of 537 of the 2,560 examinee‐patient encounters that occurred in the inter‐university examination was videotaped, providing equivalent representation of the 16 cases used in the test and the two universities. Videotaped encounters from both universities were rated by 44 SPs who presented and rated the case during the examination. Videotape and examination ratings were used to estimate systematic rating bias and the three types of rater reliability. Overall, rater reliability of individual items and overall encounter score were fair to good (.37 to .52). Consistent with these results, raters within cases accounted for 20% of the observed variance in student scores. Within‐rater reliability was better than both types of between‐raters reliability. Rater agreement was not influenced by test site, but systematic differences in score were present between test sites. Site 1 raters scored the same students, on average, 6.7% lower than Site 2 raters. These differences had an impact on the proportion of students who would have failed the check list portion of the test. In Site 1,50% of the students rated had data‐collection scores below 60%, whereas, inSite2, only 33% had scores below the 60% cutoff. The implications of these findings for single‐ and multi‐site SP‐based tests of competence are explored, and additional areas for research are identified.Keywords
This publication has 11 references indexed in Scilit:
- Binomial Regression with Monotone Splines: A Psychometric ApplicationJournal of the American Statistical Association, 1989
- Training and experience of examinersMedical Education, 1989
- What Is… Normative versus Criterion-referenced AssessmentMedical Teacher, 1989
- Factors influencing reproducibility of tests using standardized patientsTeaching and Learning in Medicine, 1989
- Evaluation of physical examination skills. Reliability of faculty observers and patient instructorsJAMA, 1987
- An objective measure of clinical performanceThe American Journal of Medicine, 1987
- Ensuring the clinical competence of medical school graduates through standardized patientsArchives of internal medicine (1960), 1987
- Simulated patients in general practice: a different look at the consultation.BMJ, 1987
- Ratings of videotaped simulated patient interviews and four other methods of evaluating a psychiatry clerkshipAmerican Journal of Psychiatry, 1987
- The selection and training of examiners for clinical examinationsMedical Education, 1980