Information‐theoretical entropy as a measure of sequence variability

1 December 1991

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 11 (4) , 297-313
https://doi.org/10.1002/prot.340110408

Abstract

We propose the use of the information-theoretical entropy, S = −Σp_i log₂ P_i, as a measure of variability at a given position in a set of aligned sequences. p_i stands for the fraction of times the i-th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and V_S, a related measure, in detail with V_K, the traditional measure of immunoglobulin sequence variability, both in the abstract and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that V_K lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that V_K and the S-based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log-linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank-frequency distribution obvious, although we discuss several possible etiologies.

Keywords

This publication has 15 references indexed in Scilit:

Alternative packing arrangements in the hydrophobic core of λrepresser
Nature, 1989
Computer Methods for Analyzing Sequence Recognition of Nucleic Acids
Annual Review of Biophysics, 1988
Selection of DNA binding sites by regulatory proteins
Journal of Molecular Biology, 1988
Predicting antibody hypervariable loop conformation. I. Ensembles of random conformations for ringlike structures
Biopolymers, 1987
Canonical structures for the hypervariable regions of immunoglobulins
Journal of Molecular Biology, 1987
Selection of DNA binding sites by regulatory proteins
Journal of Molecular Biology, 1987
Information content of binding sites on nucleotide sequences
Journal of Molecular Biology, 1986
Idiotypic Networks and Other Preconceived Ideas
Immunological Reviews, 1984
Mutations and the value of information
Journal of Theoretical Biology, 1979
Entropy increase of amino acid sequence in protein
Journal of Molecular Evolution, 1974