Analysis of correlations between sites in models of protein sequences
- 1 November 1998
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review E
- Vol. 58 (5) , 6312-6322
- https://doi.org/10.1103/physreve.58.6312
Abstract
A criterion based on conditional probabilities, related to the concept of algorithmic distance, is used to detect correlated mutations at noncontiguous sites on sequences. We apply this criterion to the problem of analyzing correlations between sites in protein sequences; however, the analysis applies generally to networks of interacting sites with discrete states at each site. Elementary models, where explicit results can be derived easily, are introduced. The number of states per site considered ranges from 2, illustrating the relation to familiar classical spin systems, to 20 states, suitable for representing amino acids. Numerical simulations show that the criterion remains valid even when the genetic history of the data samples (e.g., protein sequences), as represented by a phylogenetic tree, introduces nonindependence between samples. Statistical fluctuations due to finite sampling are also investigated and do not invalidate the criterion. A subsidiary result is found: The more homogeneous a population, the more easily its average properties can drift from the properties of its ancestor.Keywords
All Related Versions
This publication has 11 references indexed in Scilit:
- The prediction of protein contacts from multiple sequence alignmentsProtein Engineering, Design and Selection, 1996
- Covariation of residues in the homeodomain sequence familyProtein Science, 1995
- Correlated mutations and residue contacts in proteinsProteins-Structure Function and Bioinformatics, 1994
- Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations?Protein Engineering, Design and Selection, 1994
- How frequent are correlated changes in families of protein sequences?Proceedings of the National Academy of Sciences, 1994
- Compensating changes in protein multiple sequence alignmentsProtein Engineering, Design and Selection, 1994
- Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis.Proceedings of the National Academy of Sciences, 1993
- Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methodsNucleic Acids Research, 1992
- Thermodynamic cost of computation, algorithmic complexity and the information metricNature, 1989
- An upper bound for the entropy and its applications to the maximal entropy problemChemical Physics Letters, 1978