Ideal amino acid exchange forms for approximating substitution matrices
- 1 November 2007
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 69 (2) , 379-393
- https://doi.org/10.1002/prot.21509
Abstract
We have analyzed 29 published substitution matrices (SMs) and five statistical protein contact potentials (CPs) for comparison. We find that popular, ‘classical’ SMs obtained mainly from sequence alignments of globular proteins are mostly correlated by at least a value of 0.9. The BLOSUM62 is the central element of this group. A second group includes SMs derived from alignments of remote homologs or transmembrane proteins. These matrices correlate better with classical SMs (0.8) than among themselves (0.7). A third group consists of intermediate links between SMs and CPs ‐ matrices and potentials that exhibit mutual correlations of at least 0.8. Next, we show that SMs can be approximated with a correlation of 0.9 by expressions c0 + xixj + yiyj + zizj, 1≤ i, j ≤ 20, where c0 is a constant and the vectors (xi), (yi), (zi) correlate highly with hydrophobicity, molecular volume and coil preferences of amino acids, respectively. The present paper is the continuation of our work (Pokarowski et al., Proteins 2005;59:49–57), where similar approximation were used to derive ideal amino acid interaction forms from CPs. Both approximations allow us to understand general trends in amino acid similarity and can help improve multiple sequence alignments using the fast Fourier transform (MAFFT), fast threading or another methods based on alignments of physicochemical profiles of protein sequences. The use of this approximation in sequence alignments instead of a classical SM yields results that differ by less than 5%. Intermediate links between SMs and CPs, new formulas for approximating these matrices, and the highly significant dependence of classical SMs on coil preferences are new findings. Proteins 2007.Keywords
This publication has 58 references indexed in Scilit:
- Inferring ideal amino acid interaction forms from statistical protein contact potentialsProteins-Structure Function and Bioinformatics, 2005
- The characterization of amino acid sequences in proteins by statistical methodsPublished by Elsevier ,2004
- On the design and analysis of protein folding potentialsProteins-Structure Function and Bioinformatics, 2000
- Towards more meaningful hierarchical classification of amino acid scoring matricesProtein Engineering, Design and Selection, 1999
- Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residuesProteins-Structure Function and Bioinformatics, 1999
- Nature of Driving Force for Protein Folding: A Result From Analyzing the Statistical PotentialPhysical Review Letters, 1997
- Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteinsProtein Engineering, Design and Selection, 1996
- Are proteins ideal mixtures of amino acids? Analysis of energy parameter setsProtein Science, 1995
- Amino acid substitution matrices from protein blocks.Proceedings of the National Academy of Sciences, 1992
- A simple method for displaying the hydropathic character of a proteinJournal of Molecular Biology, 1982