Estimating the Frequency of Events That Cause Multiple-Nucleotide Changes
- 1 August 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Genetics
- Vol. 167 (4) , 2027-2043
- https://doi.org/10.1534/genetics.103.023226
Abstract
Existing mathematical models of DNA sequence evolution assume that all substitutions derive from point mutations. There is, however, increasing evidence that larger-scale events, involving two or more consecutive sites, may also be important. We describe a model, denoted SDT, that allows for single-nucleotide, doublet, and triplet mutations. Applied to protein-coding DNA, the SDT model allows doublet and triplet mutations to overlap codon boundaries but still permits data to be analyzed using the simplifying assumption of independence of sites. We have implemented the SDT model for maximum-likelihood phylogenetic inference and have applied it to an alignment of mammalian globin sequences and to 258 other protein-coding sequence alignments from the Pandit database. We find the SDT model's inclusion of doublet and triplet mutations to be overwhelmingly successful in giving statistically significant improvements in fit of model to data, indicating that larger-scale mutation events do occur. Distributions of inferred parameter values over all alignments analyzed suggest that these events are far more prevalent than previously thought. Detailed consideration of our results and the absence of any known mechanism causing three adjacent nucleotides to be substituted simultaneously, however, leads us to suggest that the actual evolutionary events occurring may include still-larger-scale events, such as gene conversion, inversion, or recombination, or a series of rapid compensatory changes.Keywords
This publication has 50 references indexed in Scilit:
- Intense and highly localized gene conversion activity in human meiotic crossover hot spotsNature Genetics, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- The male-specific region of the human Y chromosome is a mosaic of discrete sequence classesNature, 2003
- DNA Sequence Evolution with Neighbor-Dependent MutationJournal of Computational Biology, 2003
- Adaptive protein evolution in DrosophilaNature, 2002
- The likelihood ratio test for homogeneity in finite mixture modelsThe Canadian Journal of Statistics / La Revue Canadienne de Statistique, 2001
- Evidence for a High Frequency of Simultaneous Double-Nucleotide SubstitutionsScience, 2000
- Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard ConditionsJournal of the American Statistical Association, 1987
- Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests Under Nonstandard ConditionsJournal of the American Statistical Association, 1987
- Hypothesis testing when a nuisance parameter is present only under the alternativeBiometrika, 1987