Abstract
A priori estimates of item characteristics are necessary for the effi cient development of sound tests. Judges were asked to rate the For mat, Relevancy, Difficulty, Discrimination, and Overall Quality of multiple-choice items for an examination covering health science in formation. Ratings of Relevancy were least reliable. The combined estimates of Difficulty did not correspond with empirical values of item difficulty; however, the combined ratings for Discrimination did correlate (point-biserial) significantly (p < .05) with item-total test scores. Additionally, combined ratings of Overall Quality were correlated significantly (p < .05) with item-total correlations.

This publication has 8 references indexed in Scilit: