A Biophysical Approach to Transcription Factor Binding Site Discovery
Open Access
- 3 November 2003
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (11) , 2381-2390
- https://doi.org/10.1101/gr.1271603
Abstract
Identification of transcription factor binding sites within regulatory segments of genomic DNA is an important step toward understanding of the regulatory circuits that control expression of genes. Here, we describe a novel bioinformatics method that bases classification of potential binding sites explicitly on the estimate of sequence-specific binding energy of a given transcription factor. The method also estimates the chemical potential of the factor that defines the threshold of binding. In contrast with the widely used information-theoretic weight matrix method, the new approach correctly describes saturation in the transcription factor/DNA binding probability. This results in a significant improvement in the number of expected false positives, particularly in the ubiquitous case of low-specificity factors. In the strong binding limit, the algorithm is related to the “support vector machine” approach to pattern recognition. The new method is used to identify likely genomic binding sites for the E. coli transcription factors collected in the DPInteract database. In addition, for CRP (a global regulatory factor), the likely regulatory modality (i.e., repressor or activator) of predicted binding sites is determined.Keywords
This publication has 18 references indexed in Scilit:
- Discovery and modeling of transcriptional regulatory regionsPublished by Elsevier ,2000
- Transcription activation by catabolite activator protein (CAP)Journal of Molecular Biology, 1999
- A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genomeJournal of Molecular Biology, 1998
- Specificity, free energy and information content in protein–DNA interactionsTrends in Biochemical Sciences, 1998
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- A weight array method for splicing signal analysisBioinformatics, 1993
- TRANSCRIPTIONAL REGULATION BY cAMP AND ITS RECEPTOR PROTEINAnnual Review of Biochemistry, 1993
- Identifying protein-binding sites from unaligned DNA fragments.Proceedings of the National Academy of Sciences, 1989
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987
- Quantitative analysis of the relationship between nucleotide sequence and functional activityNucleic Acids Research, 1986