On the Optimal Design of Genetic Variant Discovery Studies
- 27 January 2010
- journal article
- Published by Walter de Gruyter GmbH in Statistical Applications in Genetics and Molecular Biology
- Vol. 9 (1) , Article33
- https://doi.org/10.2202/1544-6115.1581
Abstract
The recent emergence of massively parallel sequencing technologies has enabled an increasing number of human genome re-sequencing studies, notable among them being the 1000 Genomes Project. The main aim of these studies is to identify the yet unknown genetic variants in a genomic region, mostly low frequency variants (frequency less than 5%). We propose here a set of statistical tools that address how to optimally design such studies in order to increase the number of genetic variants we expect to discover. Within this framework, the tradeoff between lower coverage for more individuals and higher coverage for fewer individuals can be naturally solved. The methods here are also useful for estimating the number of genetic variants missed in a discovery study performed at low coverage. We show applications to simulated data based on coalescent models and to sequence data from the ENCODE project. In particular, we show the extent to which combining data from multiple populations in a discovery study may increase the number of genetic variants identified relative to studies on single populations.Keywords
This publication has 10 references indexed in Scilit:
- The Next Generation of Molecular Markers From Massively Parallel Sequencing of Pooled DNA SamplesGenetics, 2010
- Sequencing technologies — the next generationNature Reviews Genetics, 2009
- Finding the missing heritability of complex diseasesNature, 2009
- Massively Parallel Sequencing: The Next Big Thing in Genetic MedicinePublished by Elsevier ,2009
- Estimating the number of unseen variants in the human genomeProceedings of the National Academy of Sciences, 2009
- A Groupwise Association Test for Rare Mutations Using a Weighted Sum StatisticPLoS Genetics, 2009
- Next-generation DNA sequencingNature Biotechnology, 2008
- Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence DataPublished by Elsevier ,2008
- GENOME: a rapid coalescent-based whole genome simulatorBioinformatics, 2007
- THE GENETICAL STRUCTURE OF POPULATIONSAnnals of Eugenics, 1949