Measuring pairwise interobserver agreement when all subjects are judged by the same observers

1 June 1982

journal article
Published by Wiley in Statistica Neerlandica

Vol. 36 (2) , 45-61
https://doi.org/10.1111/j.1467-9574.1982.tb00774.x

Abstract

Abstract An experiment is considered where each of a sample of subjects is rated on an L‐point scale by each of a fixed group of observers. Weighted kappa coefficients are defined to measure the degree of agreement among the observers, between two particular observers, or between a particular observer and the other observers. Attention is paid to the selection of one or more homogeneous subgroups of observers. A linearized Taylor series expansion is used to derive explicit formulas for the computation of large sample standard errors. The procedures are illustrated within the context of a study where seven pathologists separately classified 118 histological slides into five categories.

Keywords

This publication has 19 references indexed in Scilit:

A general formula for the variance of Cohen's weighted kappa.
Psychological Bulletin, 1978
Testing Patterned Hypotheses in Multi-Way Contingency Tables Using Weighted Kappa and Weighted Chi Square
Educational and Psychological Measurement, 1977
Kappa revisited.
Psychological Bulletin, 1977
A review of statistical methods in the analysis of data arising from observer reliability studies (Part II)*
Statistica Neerlandica, 1975
Measuring nominal scale agreement among many raters.
Psychological Bulletin, 1971
Measures of response agreement for qualitative data: Some generalizations and alternatives.
Psychological Bulletin, 1971
Large sample standard errors of kappa and weighted kappa.
Psychological Bulletin, 1969
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
Psychological Bulletin, 1968
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960