Position dependencies in transcription factor binding sites
Open Access
- 18 February 2007
- journal article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (8) , 933-941
- https://doi.org/10.1093/bioinformatics/btm055
Abstract
Motivation: Most of the available tools for transcription factor binding site prediction are based on methods which assume no sequence dependence between the binding site base positions. Our primary objective was to investigate the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and to use the resulting data to develop improved scoring functions for binding-site prediction.Results: Using three statistical tests, we analyzed the number of binding sites showing dependent positions. We analyzed transcription factor–DNA crystal structures for evidence of position dependence. Our final conclusions were that some factors show evidence of dependencies whereas others do not. We observed that the conformational energy (Z-score) of the transcription factor–DNA complexes was lower (better) for sequences that showed dependency than for those that did not (P < 0.02). We suggest that where evidence exists for dependencies, these should be modeled to improve binding-site predictions. However, when no significant dependency is found, this correction should be omitted. This may be done by converting any existing scoring function which assumes independence into a form which includes a dependency correction. We present an example of such an algorithm and its implementation as a web tool.Availability: http://promoterplot.fmi.ch/cgi-bin/dep.htmlContact: edward.oakeley@fmi.chSupplementary information: Supplementary data (1, 2, 3, 4, 5, 6, 7 and 8) are available at Bioinformatics online.Keywords
This publication has 55 references indexed in Scilit:
- Branch and bound computation of exact p-valuesBioinformatics, 2006
- ReadOut: structure-based calculation of direct and indirect readout energies and specificities for protein-DNA recognitionNucleic Acids Research, 2006
- Efficient Exact p-Value Computation for Small Sample, Sparse, and Surprising Categorical DataJournal of Computational Biology, 2004
- Probabilistic Code for DNA Recognition by Proteins of the EGR FamilyJournal of Molecular Biology, 2002
- Additivity in protein-DNA interactions: how good an approximation is it?Nucleic Acids Research, 2002
- Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factorsNucleic Acids Research, 2002
- The Protein Data BankNucleic Acids Research, 2000
- Critical comparison of consensus methods for molecular sequencesNucleic Acids Research, 1992
- Inferring consensus structure from nucleic acid sequencesBioinformatics, 1991
- Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequencesJournal of Molecular Biology, 1990