Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees

Abstract
Markov chain Monte Carlo (MCMC) algorithms play a critical role in the Bayesian approach to phylogenetic inference. We present a theoretical analysis of the rate of convergence of many of the widely used Markov chains. For N characters generated from a uniform mixture of two trees, we prove that the Markov chains take an exponentially long (in N) number of iterations to converge to the posterior distribution. Nevertheless, the likelihood plots for sample runs of the Markov chains deceivingly suggest that the chains converge rapidly to a unique tree. Our results rely on novel mathematical understanding of the log-likelihood function on the space of phylogenetic trees. The practical implications of our work are that Bayesian MCMC methods can be misleading when the data are generated from a mixture of trees. Thus, in cases of data containing potentially conflicting phylogenetic signals, phylogenetic reconstruction should be performed separately on each signal.