Assessment of similarities of pairs and groups of proteins using transformed amino-acid-residue data
- 1 July 1982
- journal article
- Published by Springer Nature in Journal of Molecular Evolution
- Vol. 18 (4) , 240-250
- https://doi.org/10.1007/bf01734102
Abstract
Using as a primary standard a representative set of 208 proteins whose amino-acid-residue mole frequencies have been accurately established, a set of standard distributions of mole frequencies is defined for each amino acid, in terms of which percentile values for the observed mole frequencies of the amino-acid residues in any other protein can be determined. Data so transformed have a distribution much closer to Gaussian than untransformed values, and allow meaningful determinations of correlations between the amino-acid-residue compositions of two proteins as well as between pairs of amino-acid-residues within groups of proteins. Of the 153 possible pairs of amino acids (Asx and Glx are used) 39 are significantly correlated atp ≤ 0.01 and 22 atp ≤ 0.001. A percentile table is included for those wishing to use the method with programmable calculators. The transformed data for amino-acid compositions have been used to perform principal components analyses on groups of proteins in order to determine if meaningful sub-groupings (observable clusters in scatter diagrams) were detectable. Such analyses are shown for the representative set of proteins and for a group of 184 globins. With regard to the globin chains, a correlation is observed for alpha chains in the first principal component projection (PCP), (accounting for 22% of the variance) with respect to the evolutionary time-scale while beta chains show such a correlation in the first and second PCPs (22% and 18% of the variance respectively). Thus, alpha and beta chains appear to diverge from a common progenitor, similar in position to globin chains from “primitive” forms. Furthermore, globins from “primitive” forms are nearer to one another than they are to globins from the vertebrates, a finding without a priori reason, suggesting perhaps that once a chain has reached a stable relationship with its environment, strong constraints are placed on the co-existing globin chains so that they maintain appropriate interaction with one another. In addition, positions of the epsilon, gamma and delta chains are in the order: epsilon (embryonal) more primitive than gamma (foetal) more primitive than delta equal to beta (adult).Keywords
This publication has 27 references indexed in Scilit:
- Similar Amino Acid Sequences: Chance or Common Ancestry?Science, 1981
- Structure of the zeta chain of human embryonic hemoglobin.Proceedings of the National Academy of Sciences, 1981
- How reliably do amino acid composition comparisons predict sequence similarities between proteins?Journal of Theoretical Biology, 1979
- Protein EvolutionPublished by Elsevier ,1979
- Amino acid compositions and evolutionary relationships with protein familiesJournal of Theoretical Biology, 1977
- Assessment of protein sequence identity from amino acid composition dataJournal of Theoretical Biology, 1977
- Estimation of primary sequence homology from amino acid composition of evolutionary related proteinsJournal of Theoretical Biology, 1973
- A compilation of amino acid analyses of proteins. IIAnalytical Biochemistry, 1972
- A compilation of amino acid analyses of proteins. IAnalytical Biochemistry, 1971
- Molecular characteristics of yeast aldolaseBiochemistry, 1969