Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors
Open Access
- 27 June 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (17) , 1850-1857
- https://doi.org/10.1093/bioinformatics/btn331
Abstract
Motivation: Modeling and identifying the DNA-protein recognition code is one of the most challenging problems in computational biology. Several quantitative methods have been developed to model DNA-protein interactions with specific focus on the C2H2 zinc-finger proteins, the largest transcription factor family in eukaryotic genomes. In many cases, they performed well. But the overall the predictive accuracy of these methods is still limited. One of the major reasons is all these methods used weight matrix models to represent DNA-protein interactions, assuming all base-amino acid contacts contribute independently to the total free energy of binding. Results: We present a context-dependent model for DNA–zinc-finger protein interactions that allows us to identify inter-positional dependencies in the DNA recognition code for C2H2 zinc-finger proteins. The degree of non-independence was detected by comparing the linear perceptron model with the non-linear neural net (NN) model for their predictions of DNA–zinc-finger protein interactions. This dependency is supported by the complex base-amino acid contacts observed in DNA–zinc-finger interactions from structural analyses. Using extensive published qualitative and quantitative experimental data, we demonstrated that the context-dependent model developed in this study can significantly improves predictions of DNA binding profiles and free energies of binding for both individual zinc fingers and proteins with multiple zinc fingers when comparing to previous positional-independent models. This approach can be extended to other protein families with complex base-amino acid residue interactions that would help to further understand the transcriptional regulation in eukaryotic genomes. Availability:The software implemented as c programs and are available by request. http://ural.wustl.edu/softwares.html Contact:stormo@ural.wustl.eduKeywords
This publication has 50 references indexed in Scilit:
- Regulatory conservation of protein coding and microRNA genes in vertebrates: lessons from the opossum genomeGenome Biology, 2007
- Connectivity in the Yeast Cell Cycle Transcription Network: Inferences from Neural NetworksPLoS Computational Biology, 2006
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- enoLOGOS: a versatile web tool for energy normalized sequence logosNucleic Acids Research, 2005
- Ab Initio Prediction of Transcription Factor Targets Using Structural KnowledgePLoS Computational Biology, 2005
- Improved Prediction of Signal Peptides: SignalP 3.0Journal of Molecular Biology, 2004
- Intermolecular and Intramolecular Readout Mechanisms in Protein–DNA RecognitionJournal of Molecular Biology, 2004
- High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sitesNature Biotechnology, 2002
- Rearrangement of side-chains in a zif268 mutant highlights the complexities of zinc finger-DNA recognitionJournal of Molecular Biology, 2001
- Predicting the secondary structure of globular proteins using neural network modelsJournal of Molecular Biology, 1988