Viral population analysis and minority-variant detection using short read next-generation sequencing
Open Access
- 19 March 2013
- journal article
- research article
- Published by The Royal Society in Philosophical Transactions Of The Royal Society B-Biological Sciences
- Vol. 368 (1614) , 20120205
- https://doi.org/10.1098/rstb.2012.0205
Abstract
RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline ( http://sourceforge.net/projects/quasr ) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro .Keywords
This publication has 22 references indexed in Scilit:
- Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator IIBMC Bioinformatics, 2012
- Evolutionary Dynamics of Local Pandemic H1N1/2009 Influenza Virus Lineages Revealed by Whole-Genome AnalysisJournal of Virology, 2012
- CANGS DB: a stand-alone web-based database tool for processing, managing and analyzing 454 data in biodiversity studiesBMC Research Notes, 2011
- Sequence-specific error profile of Illumina sequencersNucleic Acids Research, 2011
- Quality control and preprocessing of metagenomic datasetsBioinformatics, 2011
- The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Research, 2010
- Detection of low-frequency pretherapy chemokine (CXC motif) receptor 4 (CXCR4)-using HIV-1 with ultra-deep pyrosequencingAIDS, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Accuracy and quality of massively parallel DNA pyrosequencingGenome Biology, 2007
- Characterization of mutation spectra with ultra-deep pyrosequencing: Application to HIV-1 drug resistanceGenome Research, 2007