SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data
Top Cited Papers
Open Access
- 24 July 2012
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 7 (7) , e37558
- https://doi.org/10.1371/journal.pone.0037558
Abstract
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.Keywords
This publication has 32 references indexed in Scilit:
- SNP detection and genotyping from low-coverage sequencing data on multiple diploid samplesGenome Research, 2010
- Sequencing of 50 Human Exomes Reveals Adaptation to High AltitudeScience, 2010
- Exome Sequencing of a Multigenerational Human PedigreePLOS ONE, 2009
- The sequence and de novo assembly of the giant panda genomeNature, 2009
- Evaluation of next generation sequencing platforms for population targeted sequencing studiesGenome Biology, 2009
- RNA-Seq: a revolutionary tool for transcriptomicsNature Reviews Genetics, 2009
- Mapping short DNA sequencing reads and calling variants using mapping quality scoresGenome Research, 2008
- Imputation methods to improve inference in SNP association studiesGenetic Epidemiology, 2006
- Inference of population genetic parameters in metagenomics: A clean look at messy dataGenome Research, 2006
- On the number of segregating sites in genetical models without recombinationTheoretical Population Biology, 1975