Phylogenetic models of rate heterogeneity: a high performance computing perspective

1 January 2006

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 15302075,p. 8 pp.-253
https://doi.org/10.1109/ipdps.2006.1639535

Abstract

Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Gamma and CAT models. The intention of this paper is to show that - from a purely empirical point of view - CAT can be used instead of Gamma. The main advantage of CAT over Gamma consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 real-world datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Gamma and - surprisingly enough - also yields trees with slightly superior Gamma likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55

Keywords

This publication has 19 references indexed in Scilit:

RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees
Bioinformatics, 2004
RETRACTED ARTICLE: TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics
BMC Ecology and Evolution, 2004
ARB: a software environment for sequence data
Nucleic Acids Research, 2004
Identifying site-specific substitution rates.
Molecular Biology and Evolution, 2003
The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation
Proceedings of the National Academy of Sciences, 2002
TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing
Bioinformatics, 2002
Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees
Bioinformatics, 1997
Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods
Journal of Molecular Evolution, 1994
fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood
Bioinformatics, 1994
Comparison of phylogenetic trees
Mathematical Biosciences, 1981