Frequent oligonucleotides and peptides of the Haemophilus influenzae genome
- 1 November 1996
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 24 (21) , 4263-4272
- https://doi.org/10.1093/nar/24.21.4263
Abstract
The complete Haemophilus influenzae genome (1.83 Mb, Rd strain) provides opportunities for characterizing global genomic inhomogeneities and for detecting important sequence signals. Along these lines, new methods for identifying frequent words (oligonucleotides and/or peptides) and their distributions are applied to the H.influenzae genome with some comparisons and contrasts made with frequent words of other bacterial genomes. Three major classes of frequent oligonucleotides stand out: (i) oligos related to the familiar uptake signal sequences (USSs), AAGTGCGGT (USS+) and its inverted complement (USS-), (ii) multiple tetranucleotide iterations and (iii) intergenic dyad sequences (ISDs) found as AAGCCCACCCTAC and its dyad form. The USS+ and USS- occur in almost equal counts, are remarkably evenly spaced around the genome, and appear predominantly in the same reading frame of protein coding domains (USS+ translated to Ser-Ala-Val, USS- translated to Thr-Ala-Leu). These observations suggest that USSs contribute to global genomic functions, for example, in replication and/or repair processes, or as membrane attachment sites, or as sequences helping to pack DNA. The long tetranucleotide iterations, virtually unique to H.influenzae (i.e., unknown in other prokaryotes), through polymerase slippage during replication and/or homologous recombination may produce subpopulations expressing alternative proteins. The 13 bp frequent IDS words, invariably intergenic, occur mostly in clusters and provide potential for complex secondary structures suggesting that these sequences may be important signals for regulating the activity of their flanking genes. The frequent oligopeptides of H.influenzae are principally of two kinds--those induced by oligonucleotide frequent words (USSs, tetranucleotide iterations), and those associated with ATP or GTP binding sites that are generally composed of three motifs: the A-box which contributes to delineating the binding pocket; the B-box which functions in hydrolysis; and the C-box whose function is unknown. The A-box occurs fairly universally in prokaryotes and eukaryotes. The B- and C-motifs appear to be specialized to various functional groups (e.g., transport, recombination, chaperone activity). Other putative motifs correspond to homologs of Escherichia coli motifs, for example, are associated with proteins of transcriptional processing, aminoacyl-tRNA synthetases and proteins functioning in electron transfer.Keywords
This publication has 24 references indexed in Scilit:
- A new significant recurrent dyad pairing in Haemophilus influenzaeTrends in Biochemical Sciences, 1996
- Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coliCurrent Biology, 1996
- Frequency and Distribution of DNA Uptake Signal Sequences in the Haemophilus influenzae Rd GenomeScience, 1995
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995
- Exceptional Motifs in Different Markov Chain Models for a Statistical Analysis of DNA SequencesJournal of Computational Biology, 1995
- COMPUTATIONAL DNA SEQUENCE ANALYSISAnnual Review of Microbiology, 1994
- Adaptive evolution of highly mutable loci in pathogenic bacteriaCurrent Biology, 1994
- Significant Dispersed Recurrent DNA Sequences in the Escherichia coli Genome: Several New GroupsJournal of Molecular Biology, 1993
- First and second moment of counts of words in random texts generated by Markov chainsBioinformatics, 1992
- Transformation inHaemophilus: A problem in membrane biologyThe Journal of Membrane Biology, 1984