Linguistic Measure of Taxonomic and Functional Relatedness of Nucleotide Sequences

1 June 1990

journal article
research article
Published by Taylor & Francis in Journal of Biomolecular Structure and Dynamics

Vol. 7 (6) , 1251-1268
https://doi.org/10.1080/07391102.1990.10508563

Abstract

The frequencies of “words”, oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence “texts”. Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested asa measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.

This publication has 34 references indexed in Scilit:

Computational biology for biotechnology: Part II Applications of scientific computing in biotechnology
Trends in Biotechnology, 1989
Pathogenesis by antisense
Nature, 1989
What, if anything, is Prochloron?
Nature, 1989
3D gene of foot-and-mouth disease virus
Journal of Molecular Biology, 1988
Computers in Molecular Biology: Current Applications and Emerging Trends
Science, 1988
Shakespeare's New Poem: An Ode to Statistics
Science, 1986
Rhodopseudomonas blastica atp operon
Journal of Molecular Biology, 1984
REGULATION OF THE SYNTHESIS OF RIBOSOMES AND RIBOSOMAL COMPONENTS
Annual Review of Biochemistry, 1984
Nucleotide sequence of bacteriophage λ DNA
Journal of Molecular Biology, 1982
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology, 1970