Searching for Convergence in Phylogenetic Markov Chain Monte Carlo
Open Access
- 1 August 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 55 (4) , 553-565
- https://doi.org/10.1080/10635150600812544
Abstract
Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, δ and ε, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a “metachain” to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely.Keywords
This publication has 55 references indexed in Scilit:
- Highways of gene sharing in prokaryotesProceedings of the National Academy of Sciences, 2005
- Improving the acceptance rate of reversible jump MCMC proposalsStatistics & Probability Letters, 2004
- Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inferenceBioinformatics, 2004
- Nonparametric Convergence Assessment for MCMC Model SelectionJournal of Computational and Graphical Statistics, 2003
- Ancient horizontal gene transferNature Reviews Genetics, 2003
- Efficient Construction of Reversible Jump Markov Chain Monte Carlo Proposal DistributionsJournal of the Royal Statistical Society Series B: Statistical Methodology, 2003
- Improving Convergence of the Hastings–Metropolis Algorithm with an Adaptive ProposalScandinavian Journal of Statistics, 2002
- Subtree Transfer Operations and Their Induced Metrics on Evolutionary TreesAnnals of Combinatorics, 2001
- Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic AnalysisMolecular Biology and Evolution, 2000
- Model of amino acid substitution in proteins encoded by mitochondrial DNAJournal of Molecular Evolution, 1996