An Efficient Algorithm for Statistical Multiple Alignment on Arbitrary Phylogenetic Trees

Abstract
We present an efficient algorithm for statistical multiple alignment based on the TKF91 model of Thorne, Kishino, and Felsenstein (1991) on an arbitrary k-leaved phylogenetic tree. The existing algorithms use a hidden Markov model approach, which requires at least O(√5k) states and leads to a time complexity of O(5kLk) , where L is the geometric mean sequence length. Using a combinatorial technique reminiscent of inclusion/exclusion, we are able to sum away the states, thus improving the time complexity to O(2kLk) and considerably reducing memory requirements. This makes statistical multiple alignment under the TKF91 model a definite practical possibility in the case of a phylogenetic tree with a modest number of leaves.