Modeling the Quantitative Specificity of DNA-Binding Proteins from Example Binding Sites
Open Access
- 25 August 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 4 (8) , e6736
- https://doi.org/10.1371/journal.pone.0006736
Abstract
The binding of transcription factors to their respective DNA sites is a key component of every regulatory network. Predictions of transcription factor binding sites are usually based on models for transcription factor specificity. These models, in turn, are often based on examples of known binding sites. Collections of binding sites are obtained in simulation experiments where the true model for the transcription factor is known and various sampling procedures are employed. We compare the accuracies of three different and commonly used methods for predicting the specificity of the transcription factor based on example binding sites. Different methods for constructing the models can lead to significant differences in the accuracy of the predictions and we show that commonly used methods can be positively misleading, even at large sample sizes and using noise-free data. Methods that minimize the number of predicted binding sequences are often significantly more accurate than the other methods tested. Different methods for generating motifs from example binding sites can have significantly different numbers of false positive and false negative predictions. For many different sampling procedures models based on quadratic programming are the most accurate.Keywords
This publication has 41 references indexed in Scilit:
- Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factorsNature Protocols, 2009
- A high-throughput percentage-of-binding strategy to measure binding energies in DNA–protein interactions: application to genome-scale site discoveryNucleic Acids Research, 2008
- Identification of muscle-specific regulatory modules inCaenorhabditis elegansGenome Research, 2007
- Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificitiesNature Biotechnology, 2006
- Computational identification of transcriptional regulatory elements in DNA sequenceNucleic Acids Research, 2006
- Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCEBioinformatics, 2006
- Applied bioinformatics for the identification of regulatory elementsNature Reviews Genetics, 2004
- Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing SignalsJournal of Computational Biology, 2004
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997