Comparative DNA sequence features in two longEscherichia colicontigs

Abstract
The recent sequencing of two relatively long (approximately 100 kb) contigs of E.coli presents unique opportunities for investigating heterogeneity and genomic organization of the E.coli chromosome. We have evaluated a number of common and contrasting sequence features in the two new contigs with comparisons to all available E.coli sequences (> 1.6 Mb). Our analyses include assessments of: (i) counts and distributions of restriction sites, special oligonucleotides (e.g., Chi sites, Dam and Dcm methylase targets), and other marker arrays; (ii) significant distant and close direct and inverted repeat sequences; (iii) sequence similarities between the long contigs and other E.coli sequences; (iv) characterization and identification of rare and frequent oligonucleotides; (v) compositional biases in short oligonucleotides; and (vi) position-dependent fluctuations in sequence composition. The two contigs reveal a number of distinctive features, including: a cluster of five repeat/dyad elements with very regular spacings resembling a transcription attenuator in one of the contigs; REP elements, ERICs, and other long repeats; distinction of the Chi sequence as the most frequent oligonucleotide; regions of clustering, overdispersion, and regularity of certain restriction sites and short palindromes; and comparative domains of inhomogeneities in the two long contigs. These and other features are discussed in relation to the organization of the E.coli chromosome.