Agreement among 2 × 2 Agreement Indices

Abstract
A variety of measures of reliability for two-category nominal scales are reviewed and compared. It is shown that upon correcting these indices for chance agreement, there are only five distinct indices: Fleiss's modification of A1, the φ coefficient, Cohen's kappa, and two intraclass coefficients. Additional derivations indicate that when marginals are held constant, all but one of the measures are linear functions of agreement and, thus, of one another. In particular, they are equal once the maximum obtainable values for a given data set are equated. The single exception is an intraclass correlation that explicitly includes variation due to observer mean differences as part of the error variance. This index is dependent on sample size; moreover, as the number of subjects increases, this index approaches the kappa coefficient as a limit. Recommendations for choosing an index of agreement are made based on definitions, magnitude, convenience, and consistency.

This publication has 13 references indexed in Scilit: