Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap

Open Access

21 March 2000

journal article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences

Vol. 97 (7) , 3288-3291
https://doi.org/10.1073/pnas.070154797

Abstract

Quantitative analyses of biological sequences generally proceed under the assumption that individual DNA or protein sequence elements vary independently. However, this assumption is not biologically realistic because sequence elements often vary in a concerted manner resulting from common ancestry and structural or functional constraints. We calculated intersite associations among aligned protein sequences by using mutual information. To discriminate associations resulting from common ancestry from those resulting from structural or functional constraints, we used a parametric bootstrap algorithm to construct replicate data sets. These data are expected to have intersite associations resulting solely from phylogeny. By comparing the distribution of our association statistic for the replicate data against that calculated for empirical data, we were able to assign a probability that two sites covaried resulting from structural or functional constraint rather than phylogeny. We tested our method by using an alignment of 237 basic helix–loop–helix (bHLH) protein domains. Comparison of our results against a solved three-dimensional structure confirmed the identification of several sites important to function and structure of the bHLH domain. This analytical procedure has broad utility as a first step in the identification of sites that are important to biological macromolecular structure and function when a solved structure is unavailable.

Keywords

This publication has 18 references indexed in Scilit:

A natural classification of the basic helix–loop–helix class of transcription factors
Proceedings of the National Academy of Sciences, 1997
Modeling residue usage in aligned protein sequences via maximum likelihood
Molecular Biology and Evolution, 1996
Constructing amino acid residue substitution classes maximally indicative of local protein structure
Proteins-Structure Function and Bioinformatics, 1996
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Crystal structure of MyoD bHLH domain-DNA complex: Perspectives on DNA recognition and implications for transcriptional activation
Cell, 1994
Crystal structure of transcription factor E47: E-box recognition by a basic region helix-loop-helix dimer.
Genes & Development, 1994
Correlated mutations and residue contacts in proteins
Proteins-Structure Function and Bioinformatics, 1994
Compensating changes in protein multiple sequence alignments
Protein Engineering, Design and Selection, 1994
Statistical tests of models of DNA substitution
Journal of Molecular Evolution, 1993
The rapid generation of mutation data matrices from protein sequences
Bioinformatics, 1992