Establishing Validity for Performance-Based Assessments: An Illustration for Collections of Student Writing

Abstract
Techniques for establishing the reliability and validity of assessments of student writing are presented. Raters scored collections of elementary students' narrative writing with the holistic scales of two rubrics—a new rubric designed for classroom use and known to enhance teacher practice, and an established rubric for large-scale writing assessment. Comparisons of score reliabilities were based on three methods: percentage agreement, correlations between rater pairs, and generalizability studies. Comparisons of the evidence for validity of scores were based on (a) correlations of scores with results from two other methods of writing assessment, (b) developmental patterns across grade levels, and (c) consistency of decisions made across methods of assessment. Results were mixed; good evidence was provided for the reliability and developmental validity of the new rubric, but correlational patterns were not clear. The importance of establishing performance-based assessments of writing that are both technically sound and usable by teachers is discussed.