Abstract
Because evolution occurs by random events, the actual number of substitutions that occur in any period is not exactly equal to the number expected from the mean rate of substitution, but is statistically distributed about it. In consequence, even if rates of evolution are constant in different lineages, ‘trees’ deduced from descendant protein sequences contain random errors. When there are fewer than about eight differences between the sequences of the most distantly related pair from a set of proteins, this random effect is very large. It can then render trivial the statistical disadvantage inherent in using a crude measure of protein difference, such as amino acid composition or immunological cross-reactivity, in preference to a measure based the sequences of the most distantly related pair from a set of proteins, this random effect is very large. It can then render trivial the statistical disadvantage inherent in using a crude measure of protein difference, such as amino acid composition or immunological cross-reactivity, in preference to a measure based the sequences of the most distantly related pair from a set of proteins, this random effect is very large. It can then render trivial the statistical disadvantage inherent in using a crude measure of protein difference, such as amino acid composition or immunological cross-reactivity, in preference to a measure based on amino acid sequence. In some cases, such as classification of mammals on the basis of cytochrome c structure, it appears to make little difference to the reliability of the results whether the sequences of the protein concerned are known or not. It may also be possible to obtain more reliable phylogenetic information from composition measurements on several kinds of protein than one could obtain from sequence measurements on a single kind of protein.