Theoretical foundations for a quantitative approach to paleogenetics
- 1 June 1972
- journal article
- Published by Springer Nature in Journal of Molecular Evolution
- Vol. 1 (2) , 134-149
- https://doi.org/10.1007/bf01659160
Abstract
It is shown that simply counting the number of amino acid differences between two homologous present day proteins may underestimate the number of mutagenic events that have occurred by more than a factor of three. In a previous paper (Part I) it was shown how to correct quantitatively for multiple mutagenic events at the same base site and for back mutation at that site. In this paper formulas are derived to correct for multiple mutagenic events within the same codon triplet and for the degeneracy of the genetic code. These formulas are related to the often used concept of minimum mutation distance, and it is demonstrated that the latter underestimates the number of 3-base changes (per codon) by more than an order of magnitude. The formulas developed in this paper are shown to be capable of detectinga priori, and with statistical significance, the nonrandomness that is known from experiment to exist in theA fibrinopeptides of ox, reindeer, sheep, and goat; the formulas also show, with statistical significance, that the assumption of a single ancestral DNA does not suffice to explain the known number of amino acid differences which occur between pairs of these fibrinopeptides. More explicitly, the following problems are solved: Consider a protein ofT amino acids which is coded by a polynucleotide ofL = 3 T individual nucleotide bases. Let exactlyX mutagenic events occur randomly along the length of this polynucleotide. After theX mutagenic events have occurred, a numberA, less than or equal toT, amino acid sites will differ from the corresponding sites in the ancestral protein. An explicit formula forN(A), the average number of amino acid substitutions that have occurred, is derived. Because of chance identities, the number of amino acid differencesN(d) between two homologous present day proteins will be less thanN 1 (A) plusN 2 (A), where the subscripts refer to each homologue; a formula forN(d) is derived. The limits of validity of the commonly used approximationN(A)=1/2N(d) are derived. Formulas are given which permit the estimation of the proportion of amino acid substitutions which have occurred by one base, two base, and three base changes.Keywords
This publication has 8 references indexed in Scilit:
- Theoretical foundations for a quantitative approach to paleogeneticsJournal of Molecular Evolution, 1972
- An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolutionBiochemical Genetics, 1970
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970
- Non-Darwinian EvolutionScience, 1969
- Construction of Phylogenetic TreesScience, 1967
- Base Composition of Nonsense Condons in E. coli: Evidence from Amino-Acid Substitutions at a Tryptophan Site in Alkaline PhosphataseNature, 1965
- Amino-Acid Sequence Investigations of Fibrinopeptides from Various Mammals: Evolutionary ImplicationsNature, 1964
- Some Recent Advances in Studies of the Transcription of the Genetic MessagePublished by Elsevier ,1963