Assessing and Improving Evaluation of Aircrew Performance

Abstract
Evaluation of aircrew performance serves the critical functions of assessing the qualifications of individual pilots and, under newer proficiency-based training programs, providing data for modifying training programs. We apply psychometric methods to assessing and improving the quality of evaluation of aircrew performance. Quality evaluations require human judges to recognize and discriminate changes in performance levels (sensitivity) and map these observations onto the appropriate grade-scale values (accuracy). We define statistical measures for both of these properties. A distinction is made between referent-rater reliability (RRR) and traditional interrater reliability, and we argue that RRR more meaningfully measures evaluators' grading performance and has clearer training implications. We also discuss the implementation of training and calibration sessions that are intended to help improve evaluators' ratings of aircrew performance. We offer several practical guidelines for designing and conducting these sessions.

This publication has 9 references indexed in Scilit: