Frequent oligonucleotides and peptides of the Haemophilus influenzae genome

1 November 1996

journal article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 24 (21) , 4263-4272
https://doi.org/10.1093/nar/24.21.4263

Abstract

The complete Haemophilus influenzae genome (1.83 Mb, Rd strain) provides opportunities for characterizing global genomic inhomogeneities and for detecting important sequence signals. Along these lines, new methods for identifying frequent words (oligonucleotides and/or peptides) and their distributions are applied to the H.influenzae genome with some comparisons and contrasts made with frequent words of other bacterial genomes. Three major classes of frequent oligonucleotides stand out: (i) oligos related to the familiar uptake signal sequences (USSs), AAGTGCGGT (USS+) and its inverted complement (USS-), (ii) multiple tetranucleotide iterations and (iii) intergenic dyad sequences (ISDs) found as AAGCCCACCCTAC and its dyad form. The USS+ and USS- occur in almost equal counts, are remarkably evenly spaced around the genome, and appear predominantly in the same reading frame of protein coding domains (USS+ translated to Ser-Ala-Val, USS- translated to Thr-Ala-Leu). These observations suggest that USSs contribute to global genomic functions, for example, in replication and/or repair processes, or as membrane attachment sites, or as sequences helping to pack DNA. The long tetranucleotide iterations, virtually unique to H.influenzae (i.e., unknown in other prokaryotes), through polymerase slippage during replication and/or homologous recombination may produce subpopulations expressing alternative proteins. The 13 bp frequent IDS words, invariably intergenic, occur mostly in clusters and provide potential for complex secondary structures suggesting that these sequences may be important signals for regulating the activity of their flanking genes. The frequent oligopeptides of H.influenzae are principally of two kinds--those induced by oligonucleotide frequent words (USSs, tetranucleotide iterations), and those associated with ATP or GTP binding sites that are generally composed of three motifs: the A-box which contributes to delineating the binding pocket; the B-box which functions in hydrolysis; and the C-box whose function is unknown. The A-box occurs fairly universally in prokaryotes and eukaryotes. The B- and C-motifs appear to be specialized to various functional groups (e.g., transport, recombination, chaperone activity). Other putative motifs correspond to homologs of Escherichia coli motifs, for example, are associated with proteins of transcriptional processing, aminoacyl-tRNA synthetases and proteins functioning in electron transfer.

Keywords

This publication has 24 references indexed in Scilit:

A new significant recurrent dyad pairing in Haemophilus influenzae
Trends in Biochemical Sciences, 1996
Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli
Current Biology, 1996
Frequency and Distribution of DNA Uptake Signal Sequences in the Haemophilus influenzae Rd Genome
Science, 1995
Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd
Science, 1995
Exceptional Motifs in Different Markov Chain Models for a Statistical Analysis of DNA Sequences
Journal of Computational Biology, 1995
COMPUTATIONAL DNA SEQUENCE ANALYSIS
Annual Review of Microbiology, 1994
Adaptive evolution of highly mutable loci in pathogenic bacteria
Current Biology, 1994
Significant Dispersed Recurrent DNA Sequences in the Escherichia coli Genome: Several New Groups
Journal of Molecular Biology, 1993
First and second moment of counts of words in random texts generated by Markov chains
Bioinformatics, 1992
Transformation inHaemophilus: A problem in membrane biology
The Journal of Membrane Biology, 1984