A Population Genetic Hidden Markov Model for Detecting Genomic Regions Under Selection
Open Access
- 25 February 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 27 (7) , 1673-1685
- https://doi.org/10.1093/molbev/msq053
Abstract
Recently, hidden Markov models have been applied to numerous problems in genomics. Here, we introduce an explicit population genetics hidden Markov model (popGenHMM) that uses single nucleotide polymorphism (SNP) frequency data to identify genomic regions that have experienced recent selection. Our popGenHMM assumes that SNP frequencies are emitted independently following diffusion approximation expectations but that neighboring SNP frequencies are partially correlated by selective state. We give results from the training and application of our popGenHMM to a set of early release data from the Drosophila Population Genomics Project (dpgp.org) that consists of approximately 7.8 Mb of resequencing from 32 North American Drosophila melanogaster lines. These results demonstrate the potential utility of our model, making predictions based on the site frequency spectrum (SFS) for regions of the genome that represent selected elements.Keywords
This publication has 72 references indexed in Scilit:
- Genome-wide patterns of population structure and admixture in West Africans and African AmericansProceedings of the National Academy of Sciences, 2009
- On the inference of ancestries in admixed populationsGenome Research, 2008
- Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulansPLoS Biology, 2007
- Recent and ongoing selection in the human genomeNature Reviews Genetics, 2007
- Reconstructing Genetic Ancestry Blocks in Admixed IndividualsAmerican Journal of Human Genetics, 2006
- A Map of Recent Positive Selection in the Human GenomePLoS Biology, 2006
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomesGenome Research, 2005
- Estimation of individual admixture: Analytical and study design considerationsGenetic Epidemiology, 2005
- The Genome Sequence of Drosophila melanogasterScience, 2000
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978