Specific alignment of structured RNA: stochastic grammars and sequence annealing
Open Access
- 16 September 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (23) , 2677-2683
- https://doi.org/10.1093/bioinformatics/btn495
Abstract
Motivation: Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences. Results: When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages. Availability:Stemloc-AMA is available from http://biowiki.org/StemLocAMA as part of the dart software package for sequence analysis. Contact:lpachter@math.berkeley.edu; ihh@berkeley.edu Supplementary information: Supplementary data are available at Bioinformatics online.This publication has 33 references indexed in Scilit:
- Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming MatrixPLoS Computational Biology, 2007
- Efficient pairwise RNA structure prediction and alignment using sequence alignment constraintsBMC Bioinformatics, 2006
- CONTRAfold: RNA secondary structure prediction without physics-based modelsBioinformatics, 2006
- A benchmark of multiple sequence alignment programs upon structural RNAsNucleic Acids Research, 2005
- Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%Bioinformatics, 2005
- Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure predictionBMC Bioinformatics, 2004
- Biological Sequence AnalysisPublished by Cambridge University Press (CUP) ,1998
- Finding the most significant common sequence and structure motifs in a set of RNA sequencesNucleic Acids Research, 1997
- Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural AlignmentsJournal of Molecular Biology, 1996
- Progressive sequence alignment as a prerequisitetto correct phylogenetic treesJournal of Molecular Evolution, 1987