An Efficient Algorithm for Statistical Multiple Alignment on Arbitrary Phylogenetic Trees
- 1 December 2003
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 10 (6) , 869-889
- https://doi.org/10.1089/106652703322756122
Abstract
We present an efficient algorithm for statistical multiple alignment based on the TKF91 model of Thorne, Kishino, and Felsenstein (1991) on an arbitrary k-leaved phylogenetic tree. The existing algorithms use a hidden Markov model approach, which requires at least O(√5k) states and leads to a time complexity of O(5kLk) , where L is the geometric mean sequence length. Using a combinatorial technique reminiscent of inclusion/exclusion, we are able to sum away the states, thus improving the time complexity to O(2kLk) and considerably reducing memory requirements. This makes statistical multiple alignment under the TKF91 model a definite practical possibility in the case of a phylogenetic tree with a modest number of leaves.Keywords
This publication has 14 references indexed in Scilit:
- An Improved Algorithm for Statistical Alignment of Sequences Related by a Star TreeBulletin of Mathematical Biology, 2002
- Unalignable sequences and molecular evolutionTrends in Ecology & Evolution, 2001
- Evolutionary HMMs: a Bayesian approach to multiple alignmentBioinformatics, 2001
- Statistical alignment: computational properties, homology testing and goodness-of-fit 1 1Edited by J. KarnJournal of Molecular Biology, 2000
- A Probabilistic Treatment of Phylogeny and Sequence AlignmentJournal of Molecular Evolution, 1999
- Effects of sequence alignment procedures on estimates of phylogenyBioEssays, 1998
- Sorting permutations by block-interchangesInformation Processing Letters, 1996
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- Algorithms for the Longest Common Subsequence ProblemJournal of the ACM, 1977
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970