Inference Procedures for Assessing Interobserver Agreement among Multiple Raters

Summary. We propose a new procedure for constructing inferences about a measure of interobserver agreement in studies involving a binary outcome and multiple raters. The proposed procedure, based on a chi-square goodness-of-fit test as applied to the correlated binomial model (Bahadur, 1961, in Studies in Item Analysis and Prediction, 158–176), is an extension of the goodness-of-fit procedure developed by Donner and Eliasziw (1992, Statistics in Medicine11, 1511–1519) for the case of two raters. The new procedure is shown to provide confidence-interval coverage levels that are close to nominal over a wide range of parameter combinations. The procedure also provides a sample-size formula that may be used to determine the required number of subjects and raters for such studies.

This publication has 16 references indexed in Scilit: