Solving the protein sequence metric problem
Top Cited Papers
- 25 April 2005
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 102 (18) , 6395-6400
- https://doi.org/10.1073/pnas.0408677102
Abstract
Biological sequences are composed of long strings of alphabetic letters rather than arrays of numerical values. Lack of a natural underlying metric for comparing such alphabetic data significantly inhibits sophisticated statistical analyses of sequences, modeling structural and functional aspects of proteins, and related problems. Herein, we use multivariate statistical analyses on almost 500 amino acid attributes to produce a small set of highly interpretable numeric patterns of amino acid variability. These high-dimensional attribute data are summarized by five multidimensional patterns of attribute covariation that reflect polarity, secondary structure, molecular volume, codon diversity, and electrostatic charge. Numerical scores for each amino acid then transform amino acid sequences for statistical analyses. Relationships between transformed data and amino acid substitution matrices show significant associations for polarity and codon diversity scores. Transformed alphabetic data are used in analysis of variance and discriminant analysis to study DNA binding in the basic helix-loop-helix proteins. The transformed scores offer a general solution for analyzing a wide variety of sequence analysis problems.Keywords
This publication has 23 references indexed in Scilit:
- The Basic Helix-Loop-Helix Protein Family: Comparative Genomics and Phylogenetic AnalysisGenome Research, 2001
- Application of information theory to DNA sequence analysis: A reviewPattern Recognition, 1996
- Covariation of residues in the homeodomain sequence familyProtein Science, 1995
- Exhaustive Matching of the Entire Protein Sequence DatabaseScience, 1992
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Conformation of amino acid side-chains in proteinsJournal of Molecular Biology, 1978
- An analysis of non-bonded energy of proteinsJournal of Theoretical Biology, 1977
- Amino Acid Difference Formula to Help Explain Protein EvolutionScience, 1974
- Relations between chemical structure and biological activity in peptidesJournal of Theoretical Biology, 1966
- Refractive Indices of Amino Acids, Proteins, and Related SubstancesPublished by American Chemical Society (ACS) ,1964