The effect of codon usage on the oligonucleotide composition of the E.coli genome and identification of over-and underepresented sequences by Markow chain analysis

Open Access

25 March 1987

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 15 (6) , 2627-2638
https://doi.org/10.1093/nar/15.6.2627

Abstract

As shown in the accompanying paper (5), the oligonudeotide composition of the E. coli genome is highly asymmetric for sequences up to 6 bp in length when ranked from highest to lowest abundance. We show here that this largely reflects codon usage because heavily used codons were found in the highly abundant oligomers whereas rarely used codons, with some exceptions, occurred in sequences in low abundance. Furthermore, linear regression analysis revealed a strong correlation between the frequencies of each trinucleotide and its usage as a codon. Dinucleotides are also not randoo-ly distributed across each codon position and the dinucleotide composition of genes that are transcribed but not translated (rRNA and tRNA genes) was highly related to that seen in genes encoding polypeptides. However, 45 tetra-, 8 penta-, and 6 hexanucleotides were significantly over- or underabundant by Markov chain analysis and could not be accounted for by codon usage. Of these underrepresented sequences, many were palindromes, including the Dam methylation site

Keywords

This publication has 16 references indexed in Scilit:

A comprehensive package for DNA sequence analysis in FORTRAN IV for the PDP-11
Nucleic Acids Research, 1986
Molecular evolution of bacteriophages: evidence of selection against the recognition sites of host restriction enzymes.
Molecular Biology and Evolution, 1986
Transcriptional block caused by a negative supercoiling induced structural change in an alternating CG sequence
Cell, 1984
Strong doublet preferences in nucleotide sequences and DNA geometry
Journal of Molecular Evolution, 1984
Viability of λ phages carrying a perfect palindrome in the absence of recombination nucleases
Nature, 1983
Contextual constraints on synonymous codon choice
Journal of Molecular Biology, 1983
Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: A proposal for a synonymous codon choice that is optimal for the E. coli translational system
Journal of Molecular Biology, 1981
Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.
Proceedings of the National Academy of Sciences, 1981
Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes
Journal of Molecular Biology, 1981
Codon catalog usage is a genome strategy modulated for gene expressivity
Nucleic Acids Research, 1981