A General Rule for Ranged Series of Codon Frequencies in Different Genomes

1 April 1989

journal article
research article
Published by Taylor & Francis in Journal of Biomolecular Structure and Dynamics

Vol. 6 (5) , 1001-1012
https://doi.org/10.1080/07391102.1989.10506527

Abstract

Information science widely uses descriptions of the distribution of information units (words) according to the frequency of occurrence with the help of a corresponding ranged series, i.e., the sequence of occurrence frequencies p ₁ p _r,…, p _r as taken in decreasing order. A model called the Zipf rule or Zipf law is the most commonly used. In this modelp ₁ is inversly proportional to a certain degree of range r: p _r = C/r^z (C, z > 0). Upon analysis, the correspondence of codon distribution and the Zipf model is found unsatisfactory. The distribution of letters (in English and some other languages) by the occurrence frequency does not obey the Zipf rule either. A new model is proposed for a similar distribution in which p _r = C· (ln(n+1)—In r), where n is the quantity of various symbols (codons). This dependence is approximated by a straight line not in the co-ordinate system (In r, in p), like the Zipf model, but in the (In r,p) system of co-ordinates. It is shown on the basis of statistical criteria that this model is in good agreement with the ranged series of codon frequencies for the best-studied genoms to date. This result may be regarded as an additional reason in favor of the codon-letter analogy (not the codon-word analogy) in genetic texts.

Keywords

This publication has 6 references indexed in Scilit:

Family of human Na⁺,K⁺‐ATPase genes Structure of the gene for the catalytic subunit (αIII‐form) and its relationship with structural features of the protein
FEBS Letters, 1988
Do exons code for structural or functional units in proteins?
Proceedings of the National Academy of Sciences, 1988
Codon usage tabulated from the GenBank genetic sequence data
Nucleic Acids Research, 1986
Measurements of the effects that coding for a protein has on a DNA sequence and their use for finding genes
Nucleic Acids Research, 1984
Do genes-in-pieces imply proteins-in-pieces?
Nature, 1978
Why genes in pieces?
Nature, 1978