Empirical profile mixture models for phylogenetic reconstruction
Top Cited Papers
Open Access
- 21 August 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (20) , 2317-2323
- https://doi.org/10.1093/bioinformatics/btn445
Abstract
Motivation: Previous studies have shown that accounting for site-specific amino acid replacement patterns using mixtures of stationary probability profiles offers a promising approach for improving the robustness of phylogenetic reconstructions in the presence of saturation. However, such profile mixture models were introduced only in a Bayesian context, and are not yet available in a maximum likelihood (ML) framework. In addition, these mixture models only perform well on large alignments, from which they can reliably learn the shapes of profiles, and their associated weights. Results: In this work, we introduce an expectation–maximization algorithm for estimating amino acid profile mixtures from alignment databases. We apply it, learning on the HSSP database, and observe that a set of 20 profiles is enough to provide a better statistical fit than currently available empirical matrices (WAG, JTT), in particular on saturated data. Availability: We have implemented these models into two currently available Bayesian and ML phylogenetic reconstruction programs. The two implementations, PhyloBayes, and PhyML, are freely available on our web site (http://atgc.lirmm.fr/cat). They run under Linux and MaxOSX operating systems. Contact:nicolas.lartillot@lirmm.fr Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 43 references indexed in Scilit:
- An Improved General Amino Acid Replacement MatrixMolecular Biology and Evolution, 2008
- Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous modelBMC Ecology and Evolution, 2007
- A Dirichlet process model for detecting positive selection in protein-coding DNA sequencesProceedings of the National Academy of Sciences, 2006
- Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databasesBioinformatics, 2005
- An expectation maximization algorithm for training hidden substitution models 1 1Edited by F. CohenJournal of Molecular Biology, 2002
- Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic InferenceMolecular Biology and Evolution, 1999
- Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoideaJournal of Molecular Evolution, 1989
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974