Detection of Recombination in DNA Multiple Alignments with Hidden Markov Models
- 1 September 2001
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 8 (4) , 401-427
- https://doi.org/10.1089/106652701752236214
Abstract
Conventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected.Keywords
This publication has 20 references indexed in Scilit:
- Markov Chasin Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic TreesMolecular Biology and Evolution, 1999
- Phylogenetic evidence for recombination in dengue virusMolecular Biology and Evolution, 1999
- A likelihood method for the detection of selection and recombination using nucleotide sequencesMolecular Biology and Evolution, 1997
- Factorial Hidden Markov ModelsMachine Learning, 1997
- A Hidden Markov Model approach to variation among sites in rate of evolutionMolecular Biology and Evolution, 1996
- Democracy in neural nets: Voting schemes for classificationNeural Networks, 1994
- A heuristic method to reconstruct the history of sequences subject to recombinationJournal of Molecular Evolution, 1993
- Split decomposition: A new and useful approach to phylogenetic analysis of distance dataMolecular Phylogenetics and Evolution, 1992
- PHYLOGENIES FROM MOLECULAR SEQUENCES: INFERENCE AND RELIABILITYAnnual Review of Genetics, 1988
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981