When diagnostic agreement is high, but reliability is low: Some paradoxes occurring in joint independent neuropsychology assessments

1 October 1988

journal article
research article
Published by Taylor & Francis in Journal of Clinical and Experimental Neuropsychology

Vol. 10 (5) , 605-622
https://doi.org/10.1080/01688638808402799

Abstract

Two paradoxes can occur when neuropsychologists attempt to assess the reliability of a dichotomous diagnostic instrument (e.g., one measuring the presence or absence of Dyslexia or Autism). The first paradox occurs when two pairs of examiners both produce the same high level of agreement (e.g., 85%). Nonetheless, the level of chance-corrected agreement is relatively high for one pair (e.g., .70) and quite low for the other (e.g., .32). To illustrate the second paradox, consider two examiners who are in 80% agreement in their overall diagnosis of Dyslexia. Assume, further, that they are in 100% agreement in the proportion of cases they both diagnose as Dyslexic (20%) and as Non-Dyslexic (80%). Somewhat paradoxically, the level of chance-corrected interexaminer agreement for this pair of examiners calculates to only .37. In distinct contrast, a second set of examiners also in 80% overall agreement, is in appreciable disagreement with respect to diagnostic assignments. Thus, the first neuropsychologist: (a) classifies 65% of the cases as Non-Dyslexic, as opposed to 45% so diagnosed by the second neuropsychologist; and (b) classifies the remaining 35% as Dyslexic, as compared to the 55% so classified by the second examiner. Despite these phenomena, this second pair of examiners produces a much higher level of chance-corrected agreement than did the first pair, that is, a value of .61. The underlying reasons for both of these paradoxes, as well as their resolution, are presented.

This publication has 18 references indexed in Scilit:

Null Hypothesis Disrespect in Neuropsychology: Dangers of Alpha and Beta Errors
Journal of Clinical and Experimental Neuropsychology, 1988
MISINTERPRETATION AND MISUSE OF THE KAPPA STATISTIC
American Journal of Epidemiology, 1987
The Cost of Dichotomization
Applied Psychological Measurement, 1983
A Computer Program for Assessing Specific Category Rater Agreement for Qualitative Data
Educational and Psychological Measurement, 1978
Coefficients of Agreement Between Observers and Their Interpretation
The British Journal of Psychiatry, 1977
VALIDITY OF CLINICAL EXAMINATION AND MAMMOGRAPHY AS SCREENING TESTS FOR BREAST CANCER
The Lancet, 1975
A Re-analysis of the Reliability of Psychiatric Diagnosis
The British Journal of Psychiatry, 1974
Large sample standard errors of kappa and weighted kappa.
Psychological Bulletin, 1969
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960
Measures of Association for Cross Classifications*
Journal of the American Statistical Association, 1954