HOW reliable are change‐corrected measures of agreement?
- 15 December 1993
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 12 (23) , 2191-2205
- https://doi.org/10.1002/sim.4780122305
Abstract
Chance‐corrected measures of agreement are prone to exhibit paradoxical and counter‐intuitive results when used as measures of reliability. It is demonstrated that these problems arise with Cohen's kappa as well as with Aickin's alpha. They are the consequence of an analogy to Simpson's paradox in mixed populations. It is further shown that chance‐corrected measures of agreement may yield misleading values for binary ratings. It is concluded that improvements in the design and the analysis of reliability studies are a pre‐requisite for valid and pertinent results.Keywords
This publication has 31 references indexed in Scilit:
- Latent class analysis of diagnostic agreementStatistics in Medicine, 1990
- High agreement but low Kappa: I. the problems of two paradoxesJournal of Clinical Epidemiology, 1990
- MISINTERPRETATION AND MISUSE OF THE KAPPA STATISTICAmerican Journal of Epidemiology, 1987
- The influence of uninterpretability on the assessment of diagnostic testsJournal of Chronic Diseases, 1986
- Modeling Agreement among RatersJournal of the American Statistical Association, 1985
- Coefficient Kappa: Some Uses, Misuses, and AlternativesEducational and Psychological Measurement, 1981
- A Coefficient of Agreement for Nominal ScalesEducational and Psychological Measurement, 1960
- Reliability of Content Analysis: The Case of Nominal Scale CodingPublic Opinion Quarterly, 1955