Information‐theoretical entropy as a measure of sequence variability
- 1 December 1991
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 11 (4) , 297-313
- https://doi.org/10.1002/prot.340110408
Abstract
We propose the use of the information-theoretical entropy, S = −Σpi log2 Pi, as a measure of variability at a given position in a set of aligned sequences. pi stands for the fraction of times the i-th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and VS, a related measure, in detail with VK, the traditional measure of immunoglobulin sequence variability, both in the abstract and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that VK lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that VK and the S-based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log-linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank-frequency distribution obvious, although we discuss several possible etiologies.Keywords
This publication has 15 references indexed in Scilit:
- Alternative packing arrangements in the hydrophobic core of λrepresserNature, 1989
- Computer Methods for Analyzing Sequence Recognition of Nucleic AcidsAnnual Review of Biophysics, 1988
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1988
- Predicting antibody hypervariable loop conformation. I. Ensembles of random conformations for ringlike structuresBiopolymers, 1987
- Canonical structures for the hypervariable regions of immunoglobulinsJournal of Molecular Biology, 1987
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987
- Information content of binding sites on nucleotide sequencesJournal of Molecular Biology, 1986
- Idiotypic Networks and Other Preconceived IdeasImmunological Reviews, 1984
- Mutations and the value of informationJournal of Theoretical Biology, 1979
- Entropy increase of amino acid sequence in proteinJournal of Molecular Evolution, 1974