Inferring Binding Energies from Selected Binding Sites

Open Access

4 December 2009

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 5 (12) , e1000590
https://doi.org/10.1371/journal.pcbi.1000590

Abstract

We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms. The DNA binding sites of transcription factors that control gene expression are often predicted based on a collection of known or selected binding sites. The most commonly used methods for inferring the binding site pattern, or sequence motif, assume that the sites are selected in proportion to their affinity for the transcription factor, ignoring the effect of the transcription factor concentration. We have developed a new maximum likelihood approach, in a program called BEEML, that directly takes into account the transcription factor concentration as well as non-specific contributions to the binding affinity, and we show in simulation studies that it gives a much more accurate model of the transcription factor binding sites than previous methods. We also develop a new method for extracting binding sites for a transcription factor from a random pool of DNA sequences, called high-throughput SELEX (HT-SELEX), and we show that after a single round of selection BEEML can obtain an accurate model of the transcription factor binding sites.

Keywords

This publication has 43 references indexed in Scilit:

Using ChIP-chip and ChIP-seq to study the regulation of gene expression: Genome-wide localization studies reveal widespread regulation of transcription elongation
Methods, 2009
Modeling the Quantitative Specificity of DNA-Binding Proteins from Example Binding Sites
PLOS ONE, 2009
Better estimation of protein-DNA interaction parameters improve prediction of functional sites
BMC Biotechnology, 2008
An integrated software system for analyzing ChIP-chip and ChIP-seq data
Nature Biotechnology, 2008
Energy-dependent fitness: A quantitative model for the evolution of yeast transcription factor binding sites
Proceedings of the National Academy of Sciences, 2008
A Feature-Based Approach to Modeling Protein–DNA Interactions
PLoS Computational Biology, 2008
Detecting cis -regulatory binding sites for cooperatively binding proteins
Nucleic Acids Research, 2008
Precise physical models of protein–DNA interaction from high-throughput data
Proceedings of the National Academy of Sciences, 2007
Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities
Nature Biotechnology, 2006
Selection of DNA binding sites by regulatory proteins
Journal of Molecular Biology, 1987