Genovo: De Novo Assembly for Metagenomes
- 1 March 2011
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 18 (3) , 429-443
- https://doi.org/10.1089/cmb.2010.0244
Abstract
Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic model of read generation from environmental samples and present Genovo, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model. A nonparametric prior accounts for the unknown number of genomes in the sample. Inference is performed by applying a series of hill-climbing steps iteratively until convergence. We compare the performance of Genovo to three other short read assembly programs in a series of synthetic experiments and across nine metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo's reconstructions cover more bases and recover more genes than the other methods, even for low-abundance sequences, and yield a higher assembly score. Supplementary Material is available at www.liebertoinline.com/cmb .Keywords
This publication has 27 references indexed in Scilit:
- Metagenomic analysis indicates that stressors induce production of herpes-like viruses in the coral Porites compressaProceedings of the National Academy of Sciences, 2008
- Metagenomic signatures of the Peru Margin subseafloor biosphere show a genetically distinct environmentProceedings of the National Academy of Sciences, 2008
- Velvet: Algorithms for de novo short read assembly using de Bruijn graphsGenome Research, 2008
- ALLPATHS: De novo assembly of whole-genome shotgun microreadsGenome Research, 2008
- De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computerGenome Research, 2008
- Short read fragment assembly of bacterial genomesGenome Research, 2007
- Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termiteNature, 2007
- Characterization of mutation spectra with ultra-deep pyrosequencing: Application to HIV-1 drug resistanceGenome Research, 2007
- Community structure and metabolism through reconstruction of microbial genomes from the environmentNature, 2004