Phylogenetic models of rate heterogeneity: a high performance computing perspective
- 1 January 2006
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 15302075,p. 8 pp.-253
- https://doi.org/10.1109/ipdps.2006.1639535
Abstract
Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Gamma and CAT models. The intention of this paper is to show that - from a purely empirical point of view - CAT can be used instead of Gamma. The main advantage of CAT over Gamma consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 real-world datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Gamma and - surprisingly enough - also yields trees with slightly superior Gamma likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55Keywords
This publication has 19 references indexed in Scilit:
- RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic treesBioinformatics, 2004
- RETRACTED ARTICLE: TREEFINDER: a powerful graphical analysis environment for molecular phylogeneticsBMC Ecology and Evolution, 2004
- ARB: a software environment for sequence dataNucleic Acids Research, 2004
- Identifying site-specific substitution rates.Molecular Biology and Evolution, 2003
- The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimationProceedings of the National Academy of Sciences, 2002
- TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computingBioinformatics, 2002
- Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic treesBioinformatics, 1997
- Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methodsJournal of Molecular Evolution, 1994
- fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihoodBioinformatics, 1994
- Comparison of phylogenetic treesMathematical Biosciences, 1981