A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data
- 6 December 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (4) , 423-429
- https://doi.org/10.1093/bioinformatics/bti815
Abstract
Motivation: Genome-wide chromatin-immunoprecipitation (ChIP-chip) detects binding of transcriptional regulators to DNA in vivo at low resolution. Motif discovery algorithms can be used to discover sequence patterns in the bound regions that may be recognized by the immunoprecipitated protein. However, the discovered motifs often do not agree with the binding specificity of the protein, when it is known. Results: We present a powerful approach to analyzing ChIP-chip data, called THEME, that tests hypotheses concerning the sequence specificity of a protein. Hypotheses are refined using constrained local optimization. Cross-validation provides a principled standard for selecting the optimal weighting of the hypothesis and the ChIP-chip data and for choosing the best refined hypothesis. We demonstrate how to derive hypotheses for proteins from 36 domain families. Using THEME together with these hypotheses, we analyze ChIP-chip datasets for 14 human and mouse proteins. In all the cases the identified motifs are consistent with the published data with regard to the binding specificity of the proteins. Availability: THEME is freely available for download. Contact: fraenkel-admin@mit.edu Supplementary information:Keywords
This publication has 45 references indexed in Scilit:
- Core Transcriptional Regulatory Circuitry in Human Embryonic Stem CellsCell, 2005
- Differential Roles for Sox15 and Sox2 in Transcriptional Control in Mouse Embryonic Stem CellsPublished by Elsevier ,2005
- Ab Initio Prediction of Transcription Factor Targets Using Structural KnowledgePLoS Computational Biology, 2005
- Improved detection of DNA motifs using a self-organized clustering of familial binding profilesBioinformatics, 2005
- Assessing computational tools for the discovery of transcription factor binding sitesNature Biotechnology, 2005
- Constrained Binding Site Diversity within Families of Transcription Factors Enhances Pattern Discovery BioinformaticsJournal of Molecular Biology, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- Module networks: identifying regulatory modules and their condition-specific regulators from gene expression dataNature Genetics, 2003
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998