Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences
- 7 October 2008
- journal article
- Published by The Royal Society in Philosophical Transactions Of The Royal Society B-Biological Sciences
- Vol. 363 (1512) , 3931-3939
- https://doi.org/10.1098/rstb.2008.0167
Abstract
Models of molecular evolution tend to be overly simplistic caricatures of biology that are prone to assigning high probabilities to biologically implausible DNA or protein sequences. Here, we explore how to construct time-reversible evolutionary models that yield stationary distributions of sequences that match given target distributions. By adopting comparatively realistic target distributions, evolutionary models can be improved. Instead of focusing on estimating parameters, we concentrate on the population genetic implications of these models. Specifically, we obtain estimates of the product of effective population size and relative fitness difference of alleles. The approach is illustrated with two applications to protein-coding DNA. In the first, a codon-based evolutionary model yields a stationary distribution of sequences, which, when the sequences are translated, matches a variable-length Markov model trained on human proteins. In the second, we introduce an insertion–deletion model that describes selectively neutral evolutionary changes to DNA. We then show how to modify the neutral model so that its stationary distribution at the amino acid level can match a profile hidden Markov model, such as the one associated with the Pfam database.Keywords
This publication has 40 references indexed in Scilit:
- A Site- and Time-Heterogeneous Model of Amino Acid ReplacementMolecular Biology and Evolution, 2008
- The distribution of fitness effects of new mutationsNature Reviews Genetics, 2007
- Incorporating indel information into phylogeny estimation for rapidly emerging pathogensBMC Ecology and Evolution, 2007
- Multilocus Association Mapping Using Variable-Length Markov ChainsAmerican Journal of Human Genetics, 2006
- Dependence among Sites in RNA EvolutionMolecular Biology and Evolution, 2006
- Pseudo-Likelihood Analysis of Codon Substitution Models with Neighbor-Dependent RatesJournal of Computational Biology, 2005
- Site interdependence attributed to tertiary structure in amino acid sequence evolutionGene, 2005
- Algorithms for variable length Markov chain modelingBioinformatics, 2004
- A "Long Indel" Model For Evolutionary Sequence AlignmentMolecular Biology and Evolution, 2003
- Initial sequencing and analysis of the human genomeNature, 2001