Single-pass classification of all noncoding sequences in a bacterial genome using phylogenetic profiles
- 23 February 2009
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 19 (6) , 1084-1092
- https://doi.org/10.1101/gr.089714.108
Abstract
Identification and characterization of functional elements in the noncoding regions of genomes is an elusive and time-consuming activity whose output does not keep up with the pace of genome sequencing. Hundreds of bacterial genomes lay unexploited in terms of noncoding sequence analysis, although they may conceal a wide diversity of novel RNA genes, riboswitches, or other regulatory elements. We describe a strategy that exploits the entirety of available bacterial genomes to classify all noncoding elements of a selected reference species in a single pass. This method clusters noncoding elements based on their profile of presence among species. Most noncoding RNAs (ncRNAs) display specific signatures that enable their grouping in distinct clusters, away from sequence conservation noise and other elements such as promoters. We submitted 24 ncRNA candidates from Staphylococcus aureus to experimental validation and confirmed the presence of seven novel small RNAs or riboswitches. Besides offering a powerful method for de novo ncRNA identification, the analysis of phylogenetic profiles opens a new path toward the identification of functional relationships between co-evolving coding and noncoding elements.Keywords
This publication has 40 references indexed in Scilit:
- Biocomputational prediction of small non-coding RNAs in StreptomycesBMC Genomics, 2008
- Evaluation of phylogenetic footprint discovery for predicting bacterial cis-regulatory elements and revealing their evolutionBMC Bioinformatics, 2008
- RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigationNucleic Acids Research, 2007
- The distributions, mechanisms, and structures of metabolite-binding riboswitchesGenome Biology, 2007
- Identification of differentially expressed small non‐coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomicsMolecular Microbiology, 2007
- Structured RNAs in the ENCODE selected regions of the human genomeGenome Research, 2007
- An improved method for identifying functionally linked proteins using phylogenetic profilesBMC Bioinformatics, 2007
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structureGenome Research, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004