Sequence features of DNA binding sites reveal structural class of associated transcription factor
Open Access
- 2 November 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (2) , 157-163
- https://doi.org/10.1093/bioinformatics/bti731
Abstract
Motivation: A key goal in molecular biology is to understand the mechanisms by which a cell regulates the transcription of its genes. One important aspect of this transcriptional regulation is the binding of transcription factors (TFs) to their specific cis-regulatory counterparts on the DNA. TFs recognize and bind their DNA counterparts according to the structure of their DNA-binding domains (e.g. zinc finger, leucine zipper, homeodomain). The structure of these domains can be used as a basis for grouping TFs into classes. Although the structure of DNA-binding domains varies widely across TFs generally, the TFs within a particular class bind to DNA in a similar fashion, suggesting the existence of class-specific features in the DNA sequences bound by each class of TFs. Results: In this paper, we apply a sparse Bayesian learning algorithm to identify a small set of class-specific features in the DNA sequences bound by different classes of TFs; the algorithm simultaneously learns a true multi-class classifier that uses these features to predict the DNA-binding domain of the TF that recognizes a particular set of DNA sequences. We train our algorithm on the six largest classes in TRANSFAC, comprising a total of 587 TFs. We learn a six-class classifier for this training set that achieves 87% leave-one-out cross-validation accuracy. We also identify features within cis-regulatory sequences that are highly specific to each class of TF, which has significant implications for how TF binding sites should be modeled for the purpose of motif discovery. Contact:lee@cs.duke.edu; amink@cs.duke.eduKeywords
This publication has 23 references indexed in Scilit:
- Transcriptional regulatory code of a eukaryotic genomeNature, 2004
- Constrained Binding Site Diversity within Families of Transcription Factors Enhances Pattern Discovery BioinformaticsJournal of Molecular Biology, 2004
- Sequencing and comparison of yeast species to identify genes and regulatory elementsNature, 2003
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- DNA Recognition by Cys2His2 Zinc Finger ProteinsAnnual Review of Biophysics, 2000
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- Bayesian Regularization and Pruning Using a Laplace PriorNeural Computation, 1995
- The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation ProblemJournal of the American Statistical Association, 1994
- A weight array method for splicing signal analysisBioinformatics, 1993
- TRANSCRIPTION FACTORS: Structural Families and Principles of DNA RecognitionAnnual Review of Biochemistry, 1992