Inferring Binding Energies from Selected Binding Sites
Open Access
- 4 December 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 5 (12) , e1000590
- https://doi.org/10.1371/journal.pcbi.1000590
Abstract
We employ a biophysical model that accounts for the non-linear relationship between binding energy and the statistics of selected binding sites. The model includes the chemical potential of the transcription factor, non-specific binding affinity of the protein for DNA, as well as sequence-specific parameters that may include non-independent contributions of bases to the interaction. We obtain maximum likelihood estimates for all of the parameters and compare the results to standard probabilistic methods of parameter estimation. On simulated data, where the true energy model is known and samples are generated with a variety of parameter values, we show that our method returns much more accurate estimates of the true parameters and much better predictions of the selected binding site distributions. We also introduce a new high-throughput SELEX (HT-SELEX) procedure to determine the binding specificity of a transcription factor in which the initial randomized library and the selected sites are sequenced with next generation methods that return hundreds of thousands of sites. We show that after a single round of selection our method can estimate binding parameters that give very good fits to the selected site distributions, much better than standard motif identification algorithms. The DNA binding sites of transcription factors that control gene expression are often predicted based on a collection of known or selected binding sites. The most commonly used methods for inferring the binding site pattern, or sequence motif, assume that the sites are selected in proportion to their affinity for the transcription factor, ignoring the effect of the transcription factor concentration. We have developed a new maximum likelihood approach, in a program called BEEML, that directly takes into account the transcription factor concentration as well as non-specific contributions to the binding affinity, and we show in simulation studies that it gives a much more accurate model of the transcription factor binding sites than previous methods. We also develop a new method for extracting binding sites for a transcription factor from a random pool of DNA sequences, called high-throughput SELEX (HT-SELEX), and we show that after a single round of selection BEEML can obtain an accurate model of the transcription factor binding sites.Keywords
This publication has 43 references indexed in Scilit:
- Using ChIP-chip and ChIP-seq to study the regulation of gene expression: Genome-wide localization studies reveal widespread regulation of transcription elongationMethods, 2009
- Modeling the Quantitative Specificity of DNA-Binding Proteins from Example Binding SitesPLOS ONE, 2009
- Better estimation of protein-DNA interaction parameters improve prediction of functional sitesBMC Biotechnology, 2008
- An integrated software system for analyzing ChIP-chip and ChIP-seq dataNature Biotechnology, 2008
- Energy-dependent fitness: A quantitative model for the evolution of yeast transcription factor binding sitesProceedings of the National Academy of Sciences, 2008
- A Feature-Based Approach to Modeling Protein–DNA InteractionsPLoS Computational Biology, 2008
- Detecting cis -regulatory binding sites for cooperatively binding proteinsNucleic Acids Research, 2008
- Precise physical models of protein–DNA interaction from high-throughput dataProceedings of the National Academy of Sciences, 2007
- Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificitiesNature Biotechnology, 2006
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987