Reliability of objective structured clinical examinations: Four years of experience in a surgical clerkship

Abstract
Four years of experience with an objective structured clinical examination (OSCE) following an 8‐week surgical clerkship are described. To date, 356 students have been evaluated, using a 15‐station examination, with designated faculty observers scoring each station. Mean student performance across all 4 years ranged from 65.97% to 73.53%. Reliability coefficients for each year ranged from .05 to .57; when negatively correlated stations were removed, the range was .44 to .57. Examination of the reproducibility of scores for each year, using generalizability analyses, revealed lower dependability of measures for absolute scores than for measures of relative scores (range = .37 to .54). Combining the data from all 4 years, it was estimated that extending the exam to 50 stations would achieve a generalizability coefficient of .80. Student performance across years was compared, by examining scores on stations used in more than 1 year. When means and distribution of common test items from each year were equated, actual differences in student performance were small. Implications of these studies for improvement and development of the OSCE are discussed.