A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation
Open Access
- 1 December 2007
- journal article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 56 (6) , 975-987
- https://doi.org/10.1080/10635150701670569
Abstract
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.Keywords
This publication has 46 references indexed in Scilit:
- Substantial Regional Variation in Substitution Rates in the Human Genome: Importance of GC Content, Gene Density, and Telomere-Specific EffectsJournal of Molecular Evolution, 2005
- Partition-distance: A problem and class of perfect graphs arising in clusteringInformation Processing Letters, 2002
- Maximum-Likelihood Phylogenetic Analysis Under a Covarion-like ModelMolecular Biology and Evolution, 2001
- Exploring Among-Site Rate Variation Models in a Maximum Likelihood Framework Using Empirical Data: Effects of Model Assumptions on Estimates of Topology, Branch Lengths, and Bootstrap SupportSystematic Biology, 2001
- Reversible jump Markov chain Monte Carlo computation and Bayesian model determinationBiometrika, 1995
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981
- Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric ProblemsThe Annals of Statistics, 1974
- A Bayesian Analysis of Some Nonparametric ProblemsThe Annals of Statistics, 1973
- An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolutionBiochemical Genetics, 1970
- Exponential NumbersThe American Mathematical Monthly, 1934