Abstract  An experiment is considered where each of a sample of subjects is rated on an L‐point scale by each of a fixed group of observers. Weighted kappa coefficients are defined to measure the degree of agreement among the observers, between two particular observers, or between a particular observer and the other observers. Attention is paid to the selection of one or more homogeneous subgroups of observers. A linearized Taylor series expansion is used to derive explicit formulas for the computation of large sample standard errors. The procedures are illustrated within the context of a study where seven pathologists separately classified 118 histological slides into five categories.