Analyses of Case–Control Data for Additional Outcomes

Abstract
Consider a case-control study in which prevalent cases of a given disease define the index series and members of the base population without the disease are sampled to provide the referent series. Information on a set of explanatory variables (eg, genotypes) is collected at great cost for cases and controls. The objective of the study is to evaluate the relationship between case status and the explanatory variables. Subsequently, an investigator notes that the prevalence of a second disease was measured for the members of the index and referent series. The investigator wishes to make efficient use of the available data by assessing the relationship between this second disease and the set of explanatory variables. In this paper, we discuss 2 analytic approaches that might be used to assess associations between the explanatory variables and an outcome other than the original disease. One is through the inclusion of a design variable for original disease status as a covariate; and, the second is through weighted logistic regression using the inverse of the sampling fractions as the weights. The latter approach allows the investigator to derive an estimate of association between the explanatory variables and the second disease without adjustment for the first disease. Weighted logistic regression methods are readily implemented using available statistical packages.