A Computational Pipeline for High- Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes
Open Access
- 6 July 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 3 (7) , e126
- https://doi.org/10.1371/journal.pcbi.0030126
Abstract
Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair–level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth. For decades, scientists believed that, with a few key exceptions, RNA played a secondary role in the cell. Recent discoveries have sharply revised this simple picture, revealing widespread, diverse, and surprisingly sophisticated roles for RNA. For example, many bacteria use RNA elements called “riboswitches” to switch various gene activities on or off in response to extremely sensitive detection of specific molecules. Discovery of new functional RNA elements remains a very challenging task, both computationally and experimentally. It is computationally difficult largely because of the importance of an RNA molecule's 3-D structure, and the fact that molecules with very different nucleotide sequences can fold into the same shape. In this paper, we propose a computational procedure, based on comparing the genomes of multiple bacteria, for discovery of novel RNAs. Unlike most previous approaches, ours does not require a letter-by-letter alignment of these diverse genomes, making it more applicable to RNA elements whose structure, but not nucleotide sequence, has been preserved through evolution. In an extensive test on the Firmicutes, a bacterial phylum containing well-studied organisms such as Bacillus subtilis and important pathogens such as anthrax, we recover most known noncoding RNA elements, as well as making many novel predictions.Keywords
This publication has 51 references indexed in Scilit:
- NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2007
- Identification of a large noncoding RNA in extremophilic eubacteriaProceedings of the National Academy of Sciences, 2006
- MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomesNucleic Acids Research, 2006
- Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structureGenome Research, 2006
- Identification and Classification of Conserved RNA Secondary Structures in the Human GenomePLoS Computational Biology, 2006
- NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteinsNucleic Acids Research, 2004
- Interaction of the Bacillus stearothermophilus ribosomal protein S15 with its 5′-translational operator mRNA 1 1Edited by I. TinocoJournal of Molecular Biology, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Importance of mRNA folding and start codon accessibility in the expression of genes in a ribosomal protein operon of Escherichia coliJournal of Molecular Biology, 1992
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981