A Simple Hierarchical Approach to Modeling Distributions of Substitution Rates
Open Access
- 13 October 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 22 (2) , 223-234
- https://doi.org/10.1093/molbev/msi009
Abstract
Genetic sequence data typically exhibit variability in substitution rates across sites. In practice, there is often too little variation to fit a different rate for each site in the alignment, but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution, is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a “beta-” model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data, even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta- model can fit as well or better than a model with multiple discrete rate categories, and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.Keywords
This publication has 28 references indexed in Scilit:
- HyPhy: hypothesis testing using phylogeniesBioinformatics, 2004
- Bayesian Phylogenetic Model Selection Using Reversible Jump Markov Chain Monte CarloMolecular Biology and Evolution, 2004
- The estimation of relative site variability among aligned homologous protein sequencesBioinformatics, 2003
- Taking Variation of Evolutionary Rates Between Sites into Account in Inferring PhylogeniesJournal of Molecular Evolution, 2001
- A simple method for estimating the parameter of substitution rate variation among sites.Molecular Biology and Evolution, 1997
- Modeling nucleotide evolution: A heterogeneous rate analysisMathematical Biosciences, 1996
- The superoxide dismutase molecular clock revisited.Proceedings of the National Academy of Sciences, 1994
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974
- A Bayesian Analysis of Some Nonparametric ProblemsThe Annals of Statistics, 1973
- An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolutionBiochemical Genetics, 1970