A statistical sampling algorithm for RNA secondary structure prediction
Top Cited Papers
Open Access
- 15 December 2003
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 31 (24) , 7280-7301
- https://doi.org/10.1093/nar/gkg938
Abstract
An RNA molecule, particularly a long‐chain mRNA, may exist as a population of structures. Further more, multiple structures have been demonstrated to play important functional roles. Thus, a representation of the ensemble of probable structures is of interest. We present a statistical algorithm to sample rigorously and exactly from the Boltzmann ensemble of secondary structures. The forward step of the algorithm computes the equilibrium partition functions of RNA secondary structures with recent thermodynamic parameters. Using conditional probabilities computed with the partition functions in a recursive sampling process, the backward step of the algorithm quickly generates a statistically representative sample of structures. With cubic run time for the forward step, quadratic run time in the worst case for the sampling step, and quadratic storage, the algorithm is efficient for broad applicability. We demonstrate that, by classifying sampled structures, the algorithm enables a statistical delineation and representation of the Boltzmann ensemble. Applications of the algorithm show that alternative biological structures are revealed through sampling. Statistical sampling provides a means to estimate the probability of any structural motif, with or without constraints. For example, the algorithm enables probability profiling of single‐stranded regions in RNA secondary structure. Probability profiling for specific loop types is also illustrated. By overlaying probability profiles, a mutual accessibility plot can be displayed for predicting RNA:RNA interactions. Boltzmann probability‐weighted density of states and free energy distributions of sampled structures can be readily computed. We show that a sample of moderate size from the ensemble of an enormous number of possible structures is sufficient to guarantee statistical reproducibility in the estimates of typical sampling statistics. Our applications suggest that the sampling algorithm may be well suited to prediction of mRNA structure and target accessibility. The algorithm is applicable to the rational design of small interfering RNAs (siRNAs), antisense oligonucleotides, and trans‐cleaving ribozymes in gene knock‐down studies.Keywords
This publication has 26 references indexed in Scilit:
- Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structureJournal of Molecular Biology, 1999
- Complete suboptimal folding of RNA and the stability of secondary structuresBiopolymers, 1999
- Thermodynamic Parameters for an Expanded Nearest-Neighbor Model for Formation of RNA Duplexes with Watson−Crick Base PairsBiochemistry, 1998
- Compilation of tRNA sequences and sequences of tRNA genesNucleic Acids Research, 1998
- Density of states, metastable states, and saddle points exploring the energy landscape of an RNA molecule.1997
- Molecular Beacons: Probes that Fluoresce upon HybridizationNature Biotechnology, 1996
- Landscapes: Complex optimization problems and biopolymer structuresComputers & Chemistry, 1994
- RNA multi-structure landscapes. A study based on temperature dependent partition functions.1993
- The equilibrium partition function and base pair binding probabilities for RNA secondary structureBiopolymers, 1990
- Alternative mRNA structures of the cIII gene of bacteriophage λ determine the rate of its translation initiationJournal of Molecular Biology, 1989