FragGeneScan: predicting genes in short and error-prone reads
Top Cited Papers
Open Access
- 28 August 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 38 (20) , e191
- https://doi.org/10.1093/nar/gkq747
Abstract
The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved ∼62% for reads of 400 bases with 1% sequencing errors, and ∼18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (>90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.Keywords
This publication has 45 references indexed in Scilit:
- The genomic basis of trophic strategy in marine bacteriaProceedings of the National Academy of Sciences, 2009
- The mosaic genome structure of the Wolbachia w Ri strain infecting Drosophila simulansProceedings of the National Academy of Sciences, 2009
- DIYA: a bacterial annotation pipeline for any genomics labBioinformatics, 2009
- The Human Intestinal Microbiome: A New Frontier of Human BiologyDNA Research, 2009
- A core gut microbiome in obese and lean twinsNature, 2008
- MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage GenomesDNA Research, 2008
- Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut MicrobiomesDNA Research, 2007
- MetaGene: prokaryotic gene finding from environmental genome shotgun sequencesNucleic Acids Research, 2006
- TICO: a tool for postprocessing the predictions of prokaryotic translation initiation sitesNucleic Acids Research, 2006
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002