Cluster analysis of genes in codon space

1 June 1984

journal article
research article
Published by Springer Nature in Journal of Molecular Evolution

Vol. 20 (2) , 167-174
https://doi.org/10.1007/bf02257377

Abstract

We construct a “codon space” in which a given DNA sequence can be plotted as a function of its base composition in each of the three codon positions. We demonstrate that the base composition is very highly nonrandom, with sequences from more primitive organisms having the least random compositions. By using cluster analysis on the points plotted in codon space we show that there is a strong correlation between base composition and type of organism, with the most primitive organisms having the highest A or T content in the second and third codon positions. A smooth transition toward lower A+T and higher G+C content is observed in the second and third codon positions as the evolutionary complexity of the organism increases. Besides this general trend, more detailed structure can be observed in the clustering that will become clearer as the data base is increased.

This publication has 7 references indexed in Scilit:

A thermodynamic theory of codon bias in viral genes
Journal of Theoretical Biology, 1983
On the informational content of viral DNA
Journal of Theoretical Biology, 1983
Periodic correlations in DNA sequences and evidence suggesting their evolutionary origin in a comma-less genetic code
Journal of Molecular Evolution, 1981
Codon catalog usage is a genome strategy modulated for gene expressivity
Nucleic Acids Research, 1981
Working of the genetic code
Trends in Biochemical Sciences, 1980
Codon frequencies in 119 individual genes confirm corsistent choices of degenerate bases according to genome type
Nucleic Acids Research, 1980
The Monte Carlo Method of Evaluating Integrals
Published by Defense Technical Information Center (DTIC) ,1975