Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap
Open Access
- 21 March 2000
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 97 (7) , 3288-3291
- https://doi.org/10.1073/pnas.070154797
Abstract
Quantitative analyses of biological sequences generally proceed under the assumption that individual DNA or protein sequence elements vary independently. However, this assumption is not biologically realistic because sequence elements often vary in a concerted manner resulting from common ancestry and structural or functional constraints. We calculated intersite associations among aligned protein sequences by using mutual information. To discriminate associations resulting from common ancestry from those resulting from structural or functional constraints, we used a parametric bootstrap algorithm to construct replicate data sets. These data are expected to have intersite associations resulting solely from phylogeny. By comparing the distribution of our association statistic for the replicate data against that calculated for empirical data, we were able to assign a probability that two sites covaried resulting from structural or functional constraint rather than phylogeny. We tested our method by using an alignment of 237 basic helix–loop–helix (bHLH) protein domains. Comparison of our results against a solved three-dimensional structure confirmed the identification of several sites important to function and structure of the bHLH domain. This analytical procedure has broad utility as a first step in the identification of sites that are important to biological macromolecular structure and function when a solved structure is unavailable.Keywords
This publication has 18 references indexed in Scilit:
- A natural classification of the basic helix–loop–helix class of transcription factorsProceedings of the National Academy of Sciences, 1997
- Modeling residue usage in aligned protein sequences via maximum likelihoodMolecular Biology and Evolution, 1996
- Constructing amino acid residue substitution classes maximally indicative of local protein structureProteins-Structure Function and Bioinformatics, 1996
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Crystal structure of MyoD bHLH domain-DNA complex: Perspectives on DNA recognition and implications for transcriptional activationCell, 1994
- Crystal structure of transcription factor E47: E-box recognition by a basic region helix-loop-helix dimer.Genes & Development, 1994
- Correlated mutations and residue contacts in proteinsProteins-Structure Function and Bioinformatics, 1994
- Compensating changes in protein multiple sequence alignmentsProtein Engineering, Design and Selection, 1994
- Statistical tests of models of DNA substitutionJournal of Molecular Evolution, 1993
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992