SimulFold: Simultaneously Inferring RNA Structures Including Pseudoknots, Alignments, and Trees Using a Bayesian MCMC Framework
Open Access
- 10 August 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 3 (8) , e149
- https://doi.org/10.1371/journal.pcbi.0030149
Abstract
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses. Not only is the prediction of evolutionarily conserved RNA structures important for elucidating the potential functions of RNA sequences and the mechanisms by which these functions are exerted, but it also lies at the core of RNA gene prediction. To get an accurate prediction of the conserved RNA structure, we need a high-quality sequence alignment and an evolutionary tree relating several evolutionarily related sequences. These are two strong requirements that are typically difficult to fulfill unless the encoded RNA structure is already known. We present what is to our knowledge the first method that solves this chicken-and-egg problem by co-estimating all three quantities simultaneously. We show that our novel method, called SimulFold, can be successfully applied over a wide range of sequence similarities to detect conserved RNA structures, including those with pseudoknots. We also show its potential as an alignment and phylogeny prediction method. Our method overcomes several significant limitations of existing methods and has the potential to be used for a very diverse range of tasks.Keywords
This publication has 80 references indexed in Scilit:
- Application of Phylogenetic Networks in Evolutionary StudiesMolecular Biology and Evolution, 2005
- Pseudoknots: RNA Structures with Diverse FunctionsPLoS Biology, 2005
- Multiple sequence alignment with the Clustal series of programsNucleic Acids Research, 2003
- Secondary Structure Prediction for Aligned RNA SequencesJournal of Molecular Biology, 2002
- Dynalign: an algorithm for finding the secondary structure common to two RNA sequencesJournal of Molecular Biology, 2002
- A dynamic programming algorithm for RNA structure prediction including pseudoknots 1 1Edited by I. TinocoJournal of Molecular Biology, 1999
- A Polyhedral Approach to RNA Sequence Structure AlignmentJournal of Computational Biology, 1998
- Properties of the nearest neighbor interchange metric for trees of small sizeJournal of Theoretical Biology, 1983
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- Sequential folding of transfer RNAJournal of Molecular Biology, 1980