PyEvolve: a toolkit for statistical modelling of molecular evolution

Top Cited Papers

Open Access

5 January 2004

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 5 (1) , 1
https://doi.org/10.1186/1471-2105-5-1

Abstract

Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences – ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from ~10 days to ~6 hours. PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software .

Keywords

This publication has 20 references indexed in Scilit:

Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later
SIAM Review, 2003
Molecular phylogenetics: state-of-the-art methods for looking into the past
Trends in Genetics, 2001
Adaptive evolution of the tumour suppressor BRCA1 in humans and chimpanzees
Nature Genetics, 2000
Unexpectedly similar rates of nucleotide substitution found in male and female hominids
Nature, 2000
Episodic adaptive evolution of primate lysozymes
Nature, 1997
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Global optimization of statistical functions with simulated annealing
Journal of Econometrics, 1994
Statistical tests of models of DNA substitution
Journal of Molecular Evolution, 1993
Two Guidelines for Bootstrap Hypothesis Testing
Published by JSTOR ,1991
Evolutionary trees from DNA sequences: A maximum likelihood approach
Journal of Molecular Evolution, 1981