Major issues related to evaluating student performance in clinical clerkships are reviewed, and the development of a clerkship evaluation system which attempts to deal with each issue is described. Three evaluation methods are utilized: multiple choice examination, oral examination, and ward ratings. Scores from each evaluation method for the classes of 1977–80 were analyzed to determine dependability and validity. Results indicate that the multiple choice and oral exam scores were highly dependable each year; however, the ward ratings were not dependable during the first year of use. Over the four-year period, the ward rating criteria were kept the same, annual feedback to individual faculty was provided, and residents were added as raters. The cumulative effect appears to have been a pronounced improvement in the dependability of the ward rating scores. As a result, ward rating scores now receive greater emphasis in evaluating student clerkship performance.