Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies
- 1 August 1986
- journal article
- research article
- Published by Taylor & Francis in Journal of Biomolecular Structure and Dynamics
- Vol. 4 (1) , 11-21
- https://doi.org/10.1080/07391102.1986.10507643
Abstract
The concept of “words” in continuous languages devoid of blanks is introduced and an operational definition of words given. With this novel concept nucleotide sequences become object for linguistic analysis. The typical word size of the nucleotide language is found to be 3 to 5 (tri-to pentamers). Different genomes have distinct vocabularies. Comparison of these vocabularies can serve as a basis for revealing functional and evolutionary relatedness of sequences.This publication has 13 references indexed in Scilit:
- Recognition sequences of restriction endonucleases and methylases — a reviewGene, 1985
- Distinguished words in data sequences: Analysis and applications to neural coding and other fieldsBulletin of Mathematical Biology, 1984
- Genome structure described by formal languagesNucleic Acids Research, 1984
- Complete nucleotide sequence of bacteriophage T7 DNA and the locations of T7 genetic elementsJournal of Molecular Biology, 1983
- A Vector for Introducing New Genes into PlantsScientific American, 1983
- Sequence-dependent Variations of B-DNA Structure and Protein-DNA RecognitionPublished by Cold Spring Harbor Laboratory ,1983
- Nucleotide sequence of bacteriophage λ DNAJournal of Molecular Biology, 1982
- Nucleic Acid Sequence Database IVDNA, 1982
- The number of repeats expected in random nucleic acid sequences and found in genesJournal of Theoretical Biology, 1981
- On the distribution of the nucleotides in the seven completely sequenced DNAsGene, 1980