Position dependencies in transcription factor binding sites

Open Access

18 February 2007

journal article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 23 (8) , 933-941
https://doi.org/10.1093/bioinformatics/btm055

Abstract

Motivation: Most of the available tools for transcription factor binding site prediction are based on methods which assume no sequence dependence between the binding site base positions. Our primary objective was to investigate the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and to use the resulting data to develop improved scoring functions for binding-site prediction.Results: Using three statistical tests, we analyzed the number of binding sites showing dependent positions. We analyzed transcription factor–DNA crystal structures for evidence of position dependence. Our final conclusions were that some factors show evidence of dependencies whereas others do not. We observed that the conformational energy (Z-score) of the transcription factor–DNA complexes was lower (better) for sequences that showed dependency than for those that did not (P < 0.02). We suggest that where evidence exists for dependencies, these should be modeled to improve binding-site predictions. However, when no significant dependency is found, this correction should be omitted. This may be done by converting any existing scoring function which assumes independence into a form which includes a dependency correction. We present an example of such an algorithm and its implementation as a web tool.Availability: http://promoterplot.fmi.ch/cgi-bin/dep.htmlContact: edward.oakeley@fmi.chSupplementary information: Supplementary data (1, 2, 3, 4, 5, 6, 7 and 8) are available at Bioinformatics online.

Keywords

This publication has 55 references indexed in Scilit:

Branch and bound computation of exact p-values
Bioinformatics, 2006
ReadOut: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognition
Nucleic Acids Research, 2006
Efficient Exact p-Value Computation for Small Sample, Sparse, and Surprising Categorical Data
Journal of Computational Biology, 2004
Probabilistic Code for DNA Recognition by Proteins of the EGR Family
Journal of Molecular Biology, 2002
Additivity in protein-DNA interactions: how good an approximation is it?
Nucleic Acids Research, 2002
Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors
Nucleic Acids Research, 2002
The Protein Data Bank
Nucleic Acids Research, 2000
Critical comparison of consensus methods for molecular sequences
Nucleic Acids Research, 1992
Inferring consensus structure from nucleic acid sequences
Bioinformatics, 1991
Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences
Journal of Molecular Biology, 1990