Application of DNA pooling to large studies of disease

Abstract
Large collections of individuals are required to investigate the association of commonly occurring genetic variation with disease. The laboratory assessment of one form of variation, single nucleotide polymorphisms, is costly in time and DNA. Robust statistical approaches are developed to allow the successful implementation of a recently described laboratory method for rapidly estimating allele frequency using pools of DNA. A substantial reduction in Type I error is demonstrated using simulation, through the incorporation of measurement error into confidence limits for a case–control study, illustrated on a case–control study of acute leukaemia in adults. A method for creating multiple sub-pools is described which will allow large studies, such as the proposed U.K. Biobank, to take advantage of this method. Furthermore, a set-based logistic regression is presented which allows the investigation of joint effects and interactions with other genes or environmental factors. Copyright © 2004 John Wiley & Sons, Ltd.