HOW reliable are change‐corrected measures of agreement?

15 December 1993

journal article
research article
Published by Wiley in Statistics in Medicine

Vol. 12 (23) , 2191-2205
https://doi.org/10.1002/sim.4780122305

Abstract

Chance‐corrected measures of agreement are prone to exhibit paradoxical and counter‐intuitive results when used as measures of reliability. It is demonstrated that these problems arise with Cohen's kappa as well as with Aickin's alpha. They are the consequence of an analogy to Simpson's paradox in mixed populations. It is further shown that chance‐corrected measures of agreement may yield misleading values for binary ratings. It is concluded that improvements in the design and the analysis of reliability studies are a pre‐requisite for valid and pertinent results.

Keywords

This publication has 31 references indexed in Scilit:

Latent class analysis of diagnostic agreement
Statistics in Medicine, 1990
High agreement but low Kappa: I. the problems of two paradoxes
Journal of Clinical Epidemiology, 1990
MISINTERPRETATION AND MISUSE OF THE KAPPA STATISTIC
American Journal of Epidemiology, 1987
The influence of uninterpretability on the assessment of diagnostic tests
Journal of Chronic Diseases, 1986
Modeling Agreement among Raters
Journal of the American Statistical Association, 1985
Coefficient Kappa: Some Uses, Misuses, and Alternatives
Educational and Psychological Measurement, 1981
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960
Reliability of Content Analysis: The Case of Nominal Scale Coding
Public Opinion Quarterly, 1955