CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score
Open Access
- 6 October 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (24) , 3236-3243
- https://doi.org/10.1093/bioinformatics/btp580
Abstract
Motivation: The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics. Results: We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L3+c2dL2) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs. Availability: The software called CentroidAlign, which is an implementation of the algorithm in this article, is freely available on our website: http://www.ncrna.org/software/centroidalign/. Contact:hamada-michiaki@aist.go.jp Supplementary information:Supplementary data are available at Bioinformatics online.Keywords
This publication has 48 references indexed in Scilit:
- Predictions of RNA secondary structure by combining homologous sequence informationBioinformatics, 2009
- CENTROIDFOLD: a web server for RNA secondary structure predictionNucleic Acids Research, 2009
- Specific alignment of structured RNA: stochastic grammars and sequence annealingBioinformatics, 2008
- A max-margin model for efficient simultaneous alignment and folding of RNA sequencesBioinformatics, 2008
- R-Coffee: a web server for accurately aligning noncoding RNA sequencesNucleic Acids Research, 2008
- R-Coffee: a method for multiple alignment of non-coding RNANucleic Acids Research, 2008
- Centroid estimation in discrete high-dimensional spaces with applications in biologyProceedings of the National Academy of Sciences, 2008
- PARTS: Probabilistic Alignment for RNA joinT Secondary structure predictionNucleic Acids Research, 2008
- Multiple alignment of protein sequences with repeats and rearrangementsNucleic Acids Research, 2006
- The equilibrium partition function and base pair binding probabilities for RNA secondary structureBiopolymers, 1990