Noncoding RNA gene detection using comparative sequence analysis
Open Access
- 10 October 2001
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 2 (1) , 8
- https://doi.org/10.1186/1471-2105-2-8
Abstract
Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive. Results We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class. Conclusions We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability.Keywords
This publication has 42 references indexed in Scilit:
- Amino acid substitution matrices from an information theoretic perspectivePublished by Elsevier ,2005
- Riboregulation by DsrA RNA: trans‐actions for global economyMolecular Microbiology, 2000
- Computational and Experimental Analysis Identifies Many Novel Human GenesBiochemical and Biophysical Research Communications, 2000
- A dynamic programming algorithm for RNA structure prediction including pseudoknots 1 1Edited by I. TinocoJournal of Molecular Biology, 1999
- Small RNAs in Escherichia coliTrends in Microbiology, 1999
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Basic local alignment search toolJournal of Molecular Biology, 1990
- A computational procedure for assessing the significance of RNA secondary structureBioinformatics, 1990
- A program for predicting significant RNA secondary structuresBioinformatics, 1988