Agreement Coefficients as Indices of Dependability for Domain-Referenced Tests

Abstract
A large number of seemingly diverse coefficients have been proposed as indices of dependability, or reliability, for domain-referenced and/or mastery tests. In this paper it is shown that most of these indices are special cases of two generalized indices of agreement—one that is corrected for chance and one that is not. The special cases of these two in dices are determined by assumptions about the na ture of the agreement function or, equivalently, the nature of the loss function for the testing proce dure. For example, indices discussed by Huynh (1976), Subkoviak (1976), and Swaminathan, Hambleton, and Algina (1974) employ a threshold agreement, or loss, function; whereas, indices dis cussed by Brennan and Kane (1977a, 1977b) and Livingston (1972a) employ a squared-error loss function. Since all of these indices are discussed within a single general framework, the differences among them in their assumptions, properties, and uses can be exhibited clearly. For purposes of com parison, norm-referenced generalizability coeffi cients are also developed and discussed within this general framework.