Combining diverse evidence for gene recognition in completely sequenced bacterial genomes
Open Access
- 1 June 1998
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 26 (12) , 2941-2947
- https://doi.org/10.1093/nar/26.12.2941
Abstract
Analysis of a newly sequenced bacterial genome starts with identification of protein-coding genes. Functional assignment of proteins requires the exact knowledge of protein N-termini. We present a new program ORPHEUS that identifies candidate genes and accurately predicts gene starts. The analysis starts with a database similarity search and identification of reliable gene fragments. The latter are used to derive statistical characteristics of protein-coding regions and ribosome-binding sites and to predict the complete set of genes in the analyzed genome. In a test on Bacillus subtilis and Escherichia coli genomes, the program correctly identified 93.3% (resp. 96.3%) of experimentally annotated genes longer than 100 codons described in the PIR-International database, and for these genes 96.3% (83.9%) of starts were predicted exactly. Furthermore, 98.9% (99.1%) of genes longer than 100 codons annotated in GenBank were found, and 92.9% (75.7%) of predicted starts coincided with the feature table description. Finally, for the complete gene complements of B.subtilis and E.coli , including genes shorter than 100 codons, gene prediction accuracy was 88.9 and 87.1%, respectively, with 94.2 and 76.7% starts coinciding with the existing annotation.Keywords
This publication has 36 references indexed in Scilit:
- Comparison of DNA Sequences with Protein SequencesGenomics, 1997
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Sequencing and analysis of bacterial genomesCurrent Biology, 1996
- Prediction of Function in DNA Sequence AnalysisJournal of Computational Biology, 1995
- Identification of protein coding regions by database similarity searchNature Genetics, 1993
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- What constitutes the signal for the initiation of protein synthesis on Escherichia coli mRNAs?Journal of Molecular Biology, 1988
- Information content of binding sites on nucleotide sequencesJournal of Molecular Biology, 1986