A review of statistical methods in the analysis of data arising from observer reliability studies (Part I)*

Abstract
SummaryThis paper reviews research situations in medicine, epidemiology and psychiatry, in psychological measurement and testing, and in sample surveys in which the observer(rater or interviewer) can be an important source of measurement error. Moreover, most of the statistical literature in observer variability is surveyed with attention given to a notational unification of the various models proposed. In the continuous data case, the usual analysis of variance (ANOVA) components of variance models are presented with an emphasis on the intraclass correlation coefficient as a measure of reliability. Other modified ANOVA models, response error models in sample surveys, and related multivariate extensions are also discussed. For the categorical data case, special attention is given to measures of agreement and tests of hypotheses when the data consist of dichotomous responses. In addition, similarities between the dichotomous and continous cases are illustrated in terms of intraclass correlation coefficients. Finally, measures of agreement, such as kappa and weighted‐kappa, are discussed in the context of nominal and ordinal data. A proposed unifying framework for the categorical data case is given in the form of concluding remarks.

This publication has 83 references indexed in Scilit: