Testing the Normal Approximation and Minimal Sample Size Requirements of Weighted Kappa When the Number of Categories is Large

1 January 1981

journal article
Published by SAGE Publications in Applied Psychological Measurement

Vol. 5 (1) , 101-104
https://doi.org/10.1177/014662168100500114

Abstract

The results of this computer simulation study in dicate that the weighted kappa statistic, employing a standard error developed by Fleiss, Cohen, and Everitt (1969), holds for a large number of k cate gories of classification (e.g., 8 < k ≤ 10). These data are entirely consistent with an earlier study (Cicchetti & Fleiss, 1977), which showed the same results for 3 ≤ k ≤ 7. The two studies also indicate that the minimal N required for the valid ap plication of weighted kappa can be easily approxi mated by the simple formula 2 k². This produces sample sizes that vary between a low of about 20 (when k = 3) to a high of about 200 (when k = 10). Finally, the range 3 ≤ k ≤ 10 should encompass most extant clinical scales of classification.

Keywords

This publication has 8 references indexed in Scilit:

Reliability of a Schedule for Rating Personality Disorders
The British Journal of Psychiatry, 1979
Classification of Personality Disorder
The British Journal of Psychiatry, 1979
Comparison of the Null Distributions of Weighted Kappa and the C Ordinal Statistic
Applied Psychological Measurement, 1977
Assessing Inter-Rater Reliability for Rating Scales: Resolving some Basic Issues
The British Journal of Psychiatry, 1976
Prolonged Exposure: a Rapid Treatment for Phobias
BMJ, 1971
Large sample standard errors of kappa and weighted kappa.
Psychological Bulletin, 1969
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
Psychological Bulletin, 1968
Severe Agoraphobia: A Controlled Prospective Trial of Behaviour Therapy
The British Journal of Psychiatry, 1966