ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences
Open Access
- 5 May 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 37 (10) , e76
- https://doi.org/10.1093/nar/gkp285
Abstract
Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html.Keywords
This publication has 28 references indexed in Scilit:
- The Ribosomal Database Project: improved alignments and new tools for rRNA analysisNucleic Acids Research, 2008
- Accuracy and quality of massively parallel DNA pyrosequencingGenome Biology, 2007
- Pyrosequencing enumerates and contrasts soil microbial diversityThe ISME Journal, 2007
- Use of simulated data sets to evaluate the fidelity of metagenomic processing methodsNature Methods, 2007
- Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of MicrobesPLoS Biology, 2007
- Microbial diversity in the deep sea and the underexplored “rare biosphere”Proceedings of the National Academy of Sciences, 2006
- NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genesNucleic Acids Research, 2006
- MAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids Research, 2005
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- Estimating the Number of Classes via Sample CoverageJournal of the American Statistical Association, 1992