Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors

Open Access

27 June 2008

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 24 (17) , 1850-1857
https://doi.org/10.1093/bioinformatics/btn331

Abstract

Motivation: Modeling and identifying the DNA-protein recognition code is one of the most challenging problems in computational biology. Several quantitative methods have been developed to model DNA-protein interactions with specific focus on the C₂H₂ zinc-finger proteins, the largest transcription factor family in eukaryotic genomes. In many cases, they performed well. But the overall the predictive accuracy of these methods is still limited. One of the major reasons is all these methods used weight matrix models to represent DNA-protein interactions, assuming all base-amino acid contacts contribute independently to the total free energy of binding. Results: We present a context-dependent model for DNA–zinc-finger protein interactions that allows us to identify inter-positional dependencies in the DNA recognition code for C₂H₂ zinc-finger proteins. The degree of non-independence was detected by comparing the linear perceptron model with the non-linear neural net (NN) model for their predictions of DNA–zinc-finger protein interactions. This dependency is supported by the complex base-amino acid contacts observed in DNA–zinc-finger interactions from structural analyses. Using extensive published qualitative and quantitative experimental data, we demonstrated that the context-dependent model developed in this study can significantly improves predictions of DNA binding profiles and free energies of binding for both individual zinc fingers and proteins with multiple zinc fingers when comparing to previous positional-independent models. This approach can be extended to other protein families with complex base-amino acid residue interactions that would help to further understand the transcriptional regulation in eukaryotic genomes. Availability:The software implemented as c programs and are available by request. http://ural.wustl.edu/softwares.html Contact:stormo@ural.wustl.edu

Keywords

This publication has 50 references indexed in Scilit:

Regulatory conservation of protein coding and microRNA genes in vertebrates: lessons from the opossum genome
Genome Biology, 2007
Connectivity in the Yeast Cell Cycle Transcription Network: Inferences from Neural Networks
PLoS Computational Biology, 2006
Pfam: clans, web tools and services
Nucleic Acids Research, 2006
enoLOGOS: a versatile web tool for energy normalized sequence logos
Nucleic Acids Research, 2005
Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge
PLoS Computational Biology, 2005
Improved Prediction of Signal Peptides: SignalP 3.0
Journal of Molecular Biology, 2004
Intermolecular and Intramolecular Readout Mechanisms in Protein–DNA Recognition
Journal of Molecular Biology, 2004
High-throughput SELEX–SAGE method for quantitative modeling of transcription-factor binding sites
Nature Biotechnology, 2002
Rearrangement of side-chains in a zif268 mutant highlights the complexities of zinc finger-DNA recognition
Journal of Molecular Biology, 2001
Predicting the secondary structure of globular proteins using neural network models
Journal of Molecular Biology, 1988