A comparison of methods for calculating a stratified kappa

1 September 1991

journal article
research article
Published by Wiley in Statistics in Medicine

Vol. 10 (9) , 1465-1472
https://doi.org/10.1002/sim.4780100913

Abstract

Investigators use the kappa coefficient to measure chance-corrected agreement among observers in the classification of subjects into nominal categories. The marginal probability of classification may depend, however, on one or more confounding variables. We consider assessment of interrater agreement with subjects grouped into strata on the basis of these confounders. We assume overall agreement across strata is constant and consider a stratified index of agreement, or ‘stratified kappa’, based on weighted summations of the individual kappas. We use three weighting schemes: (1) equal weighting; (2) weighting by the size of the table; and (3) weighting by the inverse of the variance. In a simulation study we compare these methods under differing probability structures and differing sample sizes for the tables. We find weighting by sample size moderately efficient under most conditions. We illustrate the techniques by assessing agreement between surgeons and graders of fundus photographs with respect to retinal characteristics, with stratification by initial severity of the disease.

Keywords

This publication has 18 references indexed in Scilit:

Coefficient Kappa: Some Uses, Misuses, and Alternatives
Educational and Psychological Measurement, 1981
Large sample variance of kappa in the case of different sets of raters.
Psychological Bulletin, 1979
Measuring nominal scale agreement among many raters.
Psychological Bulletin, 1971
Measures of response agreement for qualitative data: Some generalizations and alternatives.
Psychological Bulletin, 1971
Large sample standard errors of kappa and weighted kappa.
Psychological Bulletin, 1969
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
Psychological Bulletin, 1968
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960
ON ESTIMATING THE RELATION BETWEEN BLOOD GROUP AND DISEASE
Annals of Human Genetics, 1955