A novel method for accurate operon predictions in all sequenced prokaryotes
Open Access
- 18 February 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (3) , 880-892
- https://doi.org/10.1093/nar/gki232
Abstract
We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacter pylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from six phylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC 6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.Keywords
This publication has 32 references indexed in Scilit:
- Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairsNature Biotechnology, 2004
- Gene Expression Profiling of Helicobacter pylori Reveals a Growth-Phase-Dependent Switch in Virulence Gene ExpressionInfection and Immunity, 2003
- Congruent evolution of different classes of non-coding DNA in prokaryotic genomesNucleic Acids Research, 2002
- Co-expression pattern from DNA microarray experiments as a tool for operon predictionNucleic Acids Research, 2002
- The EcoCyc DatabaseNucleic Acids Research, 2002
- Mechanisms of Evolution in Rickettsia conorii and R. prowazekiiScience, 2001
- Genome Alignment, Evolution of Prokaryotic Genome Organization, and Prediction of Gene Function Using Genomic ContextGenome Research, 2001
- Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homologyBioinformatics, 1996
- The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applicationsNucleic Acids Research, 1987
- Codon usage in regulatory genes inEscherichia colidoes not reflect selection for ‘rare’ codonsNucleic Acids Research, 1986