Inference Procedures for Assessing Interobserver Agreement among Multiple Raters

1 June 2001

journal article
Published by Oxford University Press (OUP) in Biometrics

Vol. 57 (2) , 584-588
https://doi.org/10.1111/j.0006-341x.2001.00584.x

Abstract

Summary. We propose a new procedure for constructing inferences about a measure of interobserver agreement in studies involving a binary outcome and multiple raters. The proposed procedure, based on a chi-square goodness-of-fit test as applied to the correlated binomial model (Bahadur, 1961, in Studies in Item Analysis and Prediction, 158–176), is an extension of the goodness-of-fit procedure developed by Donner and Eliasziw (1992, Statistics in Medicine11, 1511–1519) for the case of two raters. The new procedure is shown to provide confidence-interval coverage levels that are close to nominal over a wide range of parameter combinations. The procedure also provides a sample-size formula that may be used to determine the required number of subjects and raters for such studies.

Keywords

This publication has 16 references indexed in Scilit:

Efficient Estimation of the Intraclass Correlation for a Binary Trait
Journal of Agricultural, Biological and Environmental Statistics, 1996
A goodness‐of‐fit approach to inference procedures for the kappa statistic: Confidence interval construction, significance‐testing and sample size estimation
Statistics in Medicine, 1992
How many raters? toward the most reliable diagnostic consensus
Statistics in Medicine, 1992
Analysing Intraclass Correlation for Dichotomous Variables
Journal of the Royal Statistical Society Series C: Applied Statistics, 1988
Modeling Agreement Among Raters
Journal of the American Statistical Association, 1985
Modeling Agreement among Raters
Journal of the American Statistical Association, 1985