DWE: Discriminating Word Enumerator
Open Access
- 27 August 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (1) , 31-38
- https://doi.org/10.1093/bioinformatics/bth471
Abstract
Motivation: Tissue-specific transcription factor binding sites give insight into tissue-specific transcription regulation. Results: We describe a word-counting-based tool for de novo tissue-specific transcription factor binding site discovery using expression information in addition to sequence information. We incorporate tissue-specific gene expression through gene classification to positive expression and repressed expression. We present a direct statistical approach to find overrepresented transcription factor binding sites in a foreground promoter sequence set against a background promoter sequence set. Our approach naturally extends to synergistic transcription factor binding site search. We find putative transcription factor binding sites that are overrepresented in the proximal promoters of liver-specific genes relative to proximal promoters of liver-independent genes. Our results indicate that binding sites for hepatocyte nuclear factors (especially HNF-1 and HNF-4) and CCAAT/enhancer-binding protein (C/EBPβ) are the most overrepresented in proximal promoters of liver-specific genes. Our results suggest that HNF-4 has strong synergistic relationships with HNF-1, HNF-4 and HNF-3β and with C/EBPβ. Availability: Programs are available for use over the Web at http://rulai.cshl.edu/tools/dwe Contact:ps@cs.pdx.edu; mzhang@cshl.edu Supplementary information: Data and omitted results are available at http://rulai.cshl.edu/tools/dwe/suppKeywords
This publication has 34 references indexed in Scilit:
- Similarity of position frequency matrices for transcription factor binding sitesBioinformatics, 2004
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- HNF-3β, C/EBPβ, and HNF-4 Act in Synergy to Enhance Transcription of the Human Apolipoprotein B Gene in Intestinal CellsDNA and Cell Biology, 2001
- Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approachJournal of Molecular Biology, 2000
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998
- Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling StrategiesJournal of the American Statistical Association, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Pattern recognition in several sequences: Consensus and alignmentBulletin of Mathematical Biology, 1984
- Information Theory and Statistical MechanicsPhysical Review B, 1957
- THE PROBABLE ERROR OF A MEANBiometrika, 1908