Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies
Open Access
- 29 July 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 38 (21) , 7400-7409
- https://doi.org/10.1093/nar/gkq655
Abstract
Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5‐kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated.Keywords
This publication has 39 references indexed in Scilit:
- Deep sequencing-based discovery of the Chlamydia trachomatis transcriptomeNucleic Acids Research, 2009
- Mixed Infection and the Genesis of Influenza Virus DiversityJournal of Virology, 2009
- Intra-tumor heterogeneity of MLH1 promoter methylation revealed by deep single molecule bisulfite sequencingNucleic Acids Research, 2009
- Ultra‐Deep Pyrosequencing of Hepatitis B Virus Quasispecies from Nucleoside and Nucleotide Reverse‐Transcriptase Inhibitor (NRTI)–Treated Patients and NRTI‐Naive PatientsThe Journal of Infectious Diseases, 2009
- Use of Massively Parallel Ultradeep Pyrosequencing To Characterize the Genetic Diversity of Hepatitis B Virus in Drug-Resistant and Drug-Naive Patients and To Detect Minor Variants in Reverse Transcriptase and Hepatitis B S AntigenJournal of Virology, 2009
- Genome-wide Mutational Diversity in an Evolving Population of Escherichia coliCold Spring Harbor Symposia on Quantitative Biology, 2009
- Sequencing the nuclear genome of the extinct woolly mammothNature, 2008
- Substantial biases in ultra-short read data sets from high-throughput DNA sequencingNucleic Acids Research, 2008
- Characterization of mutation spectra with ultra-deep pyrosequencing: Application to HIV-1 drug resistanceGenome Research, 2007
- DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutationsNucleic Acids Research, 2007