A comparison of methods for calculating a stratified kappa
- 1 September 1991
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 10 (9) , 1465-1472
- https://doi.org/10.1002/sim.4780100913
Abstract
Investigators use the kappa coefficient to measure chance-corrected agreement among observers in the classification of subjects into nominal categories. The marginal probability of classification may depend, however, on one or more confounding variables. We consider assessment of interrater agreement with subjects grouped into strata on the basis of these confounders. We assume overall agreement across strata is constant and consider a stratified index of agreement, or ‘stratified kappa’, based on weighted summations of the individual kappas. We use three weighting schemes: (1) equal weighting; (2) weighting by the size of the table; and (3) weighting by the inverse of the variance. In a simulation study we compare these methods under differing probability structures and differing sample sizes for the tables. We find weighting by sample size moderately efficient under most conditions. We illustrate the techniques by assessing agreement between surgeons and graders of fundus photographs with respect to retinal characteristics, with stratification by initial severity of the disease.Keywords
This publication has 18 references indexed in Scilit:
- Coefficient Kappa: Some Uses, Misuses, and AlternativesEducational and Psychological Measurement, 1981
- Large sample variance of kappa in the case of different sets of raters.Psychological Bulletin, 1979
- Measuring nominal scale agreement among many raters.Psychological Bulletin, 1971
- Measures of response agreement for qualitative data: Some generalizations and alternatives.Psychological Bulletin, 1971
- Large sample standard errors of kappa and weighted kappa.Psychological Bulletin, 1969
- Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.Psychological Bulletin, 1968
- A Coefficient of Agreement for Nominal ScalesEducational and Psychological Measurement, 1960
- ON ESTIMATING THE RELATION BETWEEN BLOOD GROUP AND DISEASEAnnals of Human Genetics, 1955