Phylogenetic inference under recombination using Bayesian stochastic topology selection
Open Access
- 20 November 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (2) , 197-203
- https://doi.org/10.1093/bioinformatics/btn607
Abstract
Motivation: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths. Results: We demonstrate our approach on simulated data and also to the genome of a suspected HIV recombinant strain as well as to an investigation of recombination in the sequences of 15 laboratory mouse strains sequenced by Perlegen Sciences. Our findings indicate that our method allows us to distinguish between rate heterogeneity and variation in phylogeny caused by recombination without being restricted to 4-taxa data. Availability: The method has been implemented in JAVA and is available, along with data studied here, from http://www.stats.ox.ac.uk/~webb. Contact:cholmes@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 24 references indexed in Scilit:
- Segmenting Bacterial and Viral DNA Sequence Alignments with a Trans-Dimensional Phylogenetic Factorial Hidden Markov ModelJournal of the Royal Statistical Society Series C: Applied Statistics, 2009
- Phylogenetic Detection of Recombination with a Bayesian Prior on the Distance between TreesPLOS ONE, 2008
- Addressing the Shortcomings of Three Recent Bayesian Methods for Detecting Interspecific Recombination in DNA Sequence AlignmentsStatistical Applications in Genetics and Molecular Biology, 2008
- A sequence-based variation map of 8.27 million SNPs in inbred mouse strainsNature, 2007
- Efficiently Computing the Robinson-Foulds MetricJournal of Computational Biology, 2007
- Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov ModelPLoS Genetics, 2007
- Tree View: An application to display phylogenetic trees on personal computersBioinformatics, 1996
- Reversible jump Markov chain Monte Carlo computation and Bayesian model determinationBiometrika, 1995
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981