Correlations Among Amino Acid Sites in bHLH Protein Domains: An Information Theoretic Analysis
Open Access
- 1 January 2000
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 17 (1) , 164-178
- https://doi.org/10.1093/oxfordjournals.molbev.a026229
Abstract
An information theoretic approach is used to examine the magnitude and origin of associations among amino acid sites in the basic helix-loop-helix (bHLH) family of transcription factors. Entropy and mutual information values are used to summarize the variability and covariability of amino acids comprising the bHLH domain for 242 sequences. When these quantitative measures are integrated with crystal structure data and summarized using helical wheels, they provide important insights into the evolution of three-dimensional structure in these proteins. We show that amino acid sites in the bHLH domain known to pack against each other have very low entropy values, indicating little residue diversity at these contact sites. Noncontact sites, on the other hand, exhibit significantly larger entropy values, as well as statistically significant levels of mutual information or association among sites. High levels of mutual information indicate significant amounts of intercorrelation among amino acid residues at these various sites. Using computer simulations based on a parametric bootstrap procedure, we are able to partition the observed covariation among various amino acid sites into that arising from phylogenetic (common ancestry) and stochastic causes and those resulting from structural and functional constraints. These results show that a significant amount of the observed covariation among amino acid sites is due to structural/functional constraints, over and above the covariation arising from phylogenetic constraints. These quantitative analyses provide a highly integrated evolutionary picture of the multidimensional dynamics of sequence diversity and protein structure.Keywords
This publication has 37 references indexed in Scilit:
- A helix-turn-helix structure unit in human centromere protein B (CENP-B)The EMBO Journal, 1998
- Application of information theory to DNA sequence analysis: A reviewPattern Recognition, 1996
- Covariation of residues in the homeodomain sequence familyProtein Science, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Discovering structural correlations in α‐helicesProtein Science, 1994
- Crystal structure of transcription factor E47: E-box recognition by a basic region helix-loop-helix dimer.Genes & Development, 1994
- Protein classification by stochastic modeling and optimal filtering of amino-acid sequencesMathematical Biosciences, 1994
- Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domainNature, 1993
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Phylogenies and the Comparative MethodThe American Naturalist, 1985