Secondary analysis of case-control data
- 11 October 2005
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 25 (8) , 1323-1339
- https://doi.org/10.1002/sim.2283
Abstract
We extend the discussion of Lee et al. (Stat. Med. 1997; 16:1377–1389) and others on methods for performing secondary analyses of case‐control sampled data and carry out an extensive investigation of efficiency and robustness. We find that, with the exception of the ‘analyse‐the‐controls‐only’ strategy for populations in which cases are rare, ad hoc methods in common usage often lead to extremely misleading conclusions and that it is not possible to tell in advance when this will happen. Weighted likelihood and semi‐parametric maximum likelihood methods are justified theoretically. We find that semi‐parametric maximum likelihood can be as much as twice as efficient as the weighted method, but is subject to bias in estimating parameters of interest when the nuisance models this method requires have been misspecified. The weighted method needs no nuisance models and thus is robust in this regard, but we cannot tell when it is going to be very inefficient without sophisticated modelling as through the SPML method. Practitioners should routinely use both methods and will often have to weigh up the practical consequences of severe inefficiency and lack of robustness in the context of their enquiries. Copyright © 2005 John Wiley & Sons, Ltd.Keywords
This publication has 9 references indexed in Scilit:
- On the Robustness of Weighted Methods for Fitting Models to Case–Control EdataJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- The analysis of retrospective family studiesBiometrika, 2002
- Risk factors for small‐for‐gestational‐age babies: The Auckland Birthweight Collaborative StudyJournal of Paediatrics and Child Health, 2001
- Maximum likelihood for generalised case-control studiesJournal of Statistical Planning and Inference, 2001
- RE-USING DATA FROM CASE-CONTROL STUDIESStatistics in Medicine, 1997
- Logistic regression in case‐control studies: The effect of using independent as dependent variablesStatistics in Medicine, 1995
- A Parametric Model for Cluster Correlated Categorical DataPublished by JSTOR ,1994
- Generating Random Binary Deviates Having Fixed Marginal Distributions and Specified Degrees of AssociationThe American Statistician, 1993