Identification of the binding sites of regulatory proteins in bacterial genomes
- 14 August 2002
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 99 (18) , 11772-11777
- https://doi.org/10.1073/pnas.112341999
Abstract
We present an algorithm that extracts the binding sites (represented by position-specific weight matrices) for many different transcription factors from the regulatory regions of a genome, without the need for delineating groups of coregulated genes. The algorithm uses the fact that many DNA-binding proteins in bacteria bind to a bipartite motif with two short segments more conserved than the intervening region. It identifies all statistically significant patterns of the form W1NxW2, where W1 and W2 are two short oligonucleotides separated by x arbitrary bases, and groups them into clusters of similar patterns. These clusters are then used to derive quantitative recognition profiles of putative regulatory proteins. For a given cluster, the algorithm finds the matching sequences plus the flanking regions in the genome and performs a multiple sequence alignment to derive position-specific weight matrices. We have analyzed the Escherichia coli genome with this algorithm and found ≈1,500 significant patterns, which give rise to ≈160 distinct position-specific weight matrices. A fraction of these matrices match the binding sites of one-third of the ≈60 characterized transcription factors with high statistical significance. Many of the remaining matrices are likely to describe binding sites and regulons of uncharacterized transcription factors. The significance of these matrices was evaluated by their specificity, the location of the predicted sites, and the biological functions of the corresponding regulons, allowing us to suggest putative regulatory functions. The algorithm is efficient for analyzing newly sequenced bacterial genomes for which little is known about transcriptional regulation.Keywords
This publication has 29 references indexed in Scilit:
- The Evolution of DNA Regulatory Regions for Proteo-Gamma Bacteria by Interspecies ComparisonsGenome Research, 2002
- Inferring regulatory elements from a whole genome. an analysis of Helicobacter pyloriσ80 family of promoter signalsJournal of Molecular Biology, 2000
- MultiFun, a Multifunctional Classification Scheme forEscherichia coliK-12 Gene ProductsMicrobial & Comparative Genomics, 2000
- Clustering Gene Expression PatternsJournal of Computational Biology, 1999
- A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genomeJournal of Molecular Biology, 1998
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987