Linguistic Features of Noncoding DNA Sequences
- 5 December 1994
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review Letters
- Vol. 73 (23) , 3169-3172
- https://doi.org/10.1103/physrevlett.73.3169
Abstract
We extend the Zipf approach to analyzing linguistic texts to the statistical study of DNA base pair sequences and find that the noncoding regions are more similar to natural languages than the coding regions. We also adapt the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and demonstrate that noncoding regions in eukaryotes display a smaller entropy and larger redundancy than coding regions, supporting the possibility that noncoding regions of DNA may carry biological information.Keywords
This publication has 15 references indexed in Scilit:
- Generic modelling of cooperative growth patterns in bacterial coloniesNature, 1994
- 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegansNature, 1994
- INTRONS AS MOBILE GENETIC ELEMENTSAnnual Review of Biochemistry, 1993
- LONG RANGE CORRELATION IN HUMAN WRITINGSFractals, 1993
- Complex fractal dimension of the bronchial treePhysical Review Letters, 1991
- The multiple codes of nucleotide sequencesBulletin of Mathematical Biology, 1989
- A General Rule for Ranged Series of Codon Frequencies in Different GenomesJournal of Biomolecular Structure and Dynamics, 1989
- Fractal Time in Condensed MatterAnnual Review of Physical Chemistry, 1988
- Nucleotide distribution and the recognition of coding regions in DNA sequences: An information theory approachJournal of Theoretical Biology, 1985
- The information content of DNAJournal of Theoretical Biology, 1966