Abstract
Capture-recapture methods in epidemiology analyze data from overlapping lists of cases from various sources of ascertainment to generate estimates of missing cases and the total affected. Applications of these methods usually recognize the possibility of, and attempt to adjust for, nonindependent ascertainment by the various sources used. However, separate from the issue of dependencies between sources is the complexity of within source variation in probability of ascertainment of cases, e.g., variation in ascertainment by population subgroups, such as socioeconomic classes, races, or other subdivisions. The authors present a general approach to this issue for the two-source case that takes account of not only biases that arise from such “variable catchability” within sources but also the separate complexity of dependencies between sources. A general formula, (K − ▵)/(K + ▿), is derived that allows simultaneous calculation of the effects of variable catchability, ▵, and source dependencies, ▿ upon the accuracy of the two-source estimate. The effect of variable catchability upon accuracy and applications to data by race on the neurodegenerative disorder, Huntington's disease, are presented. In the latter analysis, multiple different two-source estimates of prevalence were made, considering each source versus all others pooled. Most of the likely bias was found to be due to source dependencies; variable catchability contributed relatively little bias. Multiple poolings of all but one source may prove a generally efficient method for overcoming the problem of likely variable catchability, at least when there are data from many distinct sources.