Computational Complexity of Inferring Phylogenies by Compatibility

Abstract
A well-known approach to inferring phylogenies involves finding a phylogeny with the largest number of characters that are perfectly compatible with it. Variations of this problem depend on whether characters are: cladistic (rooted) or qualitative (unrooted); binary (two states) or unconstrained (more than one state). The computational cost of known algorithms that guarantee solutions to these problems increases at least exponentially with problem size; practical computational considerations restrict the use of such algorithms to analyzing problems of small size. We establish that the four basic variants of the compatibility problem are all NP-complete and, thus, are so difficult computationally that for them efficient optimal algorithms are not likely to exist.