Lightweight comparison of RNAs based on exact sequence–structure matches
Open Access
- 2 February 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (16) , 2095-2102
- https://doi.org/10.1093/bioinformatics/btp065
Abstract
Motivation: Specific functions of ribonucleic acid (RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence–structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs. Based on a fast detectable set of overlapping and crossing substructure matches for two nested RNA secondary structures, our method ExpaRNA (exact pattern of alignment of RNA) computes the longest collinear sequence of substructures common to two RNAs in O(H·nm) time and O(nm) space, where H ≪ n·m for real RNA structures. Applied to different RNAs, our method correctly identifies sequence–structure similarities between two RNAs. Results: We have compared ExpaRNA with two other alignment methods that work with given RNA structures, namely RNAforester and RNA_align. The results are in good agreement, but can be obtained in a fraction of running time, in particular for larger RNAs. We have also used ExpaRNA to speed up state-of-the-art Sankoff-style alignment tools like LocARNA, and observe a tradeoff between quality and speed. However, we get a speedup of 4.25 even in the highest quality setting, where the quality of the produced alignment is comparable to that of LocARNA alone. Availability: The presented algorithm is implemented in the program ExpaRNA, which is available from our website (http://www.bioinf.uni-freiburg.de/Software). Contact: {exparna@informatik.uni-freiburg.de,backofen@informatik.uni-freiburg.de} Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 29 references indexed in Scilit:
- Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming MatrixPLoS Computational Biology, 2007
- Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimizationBMC Bioinformatics, 2007
- Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based ClusteringPLoS Computational Biology, 2007
- The longest common subsequence problem for sequences with nested arc annotationsJournal of Computer and System Sciences, 2002
- A General Edit Distance between RNA StructuresJournal of Computational Biology, 2002
- Dynalign: an algorithm for finding the secondary structure common to two RNA sequencesJournal of Molecular Biology, 2002
- Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structureJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Selenoprotein synthesis in archaea: identification of an mRNA element of Methanococcus jannaschii probably directing selenocysteine insertionJournal of Molecular Biology, 1997
- Alignment of trees — an alternative to tree editTheoretical Computer Science, 1995