Computational framework for the prediction of transcription factor binding sites by multiple data integration
Open Access
- 30 October 2006
- journal article
- review article
- Published by Springer Nature in BMC Neuroscience
- Vol. 7 (S1) , S8
- https://doi.org/10.1186/1471-2202-7-s1-s8
Abstract
Control of gene expression is essential to the establishment and maintenance of all cell types, and its dysregulation is involved in pathogenesis of several diseases. Accurate computational predictions of transcription factor regulation may thus help in understanding complex diseases, including mental disorders in which dysregulation of neural gene expression is thought to play a key role. However, biological mechanisms underlying the regulation of gene expression are not completely understood, and predictions via bioinformatics tools are typically poorly specific. We developed a bioinformatics workflow for the prediction of transcription factor binding sites from several independent datasets. We show the advantages of integrating information based on evolutionary conservation and gene expression, when tackling the problem of binding site prediction. Consistent results were obtained on a large simulated dataset consisting of 13050 in silico promoter sequences, on a set of 161 human gene promoters for which binding sites are known, and on a smaller set of promoters of Myc target genes. Our computational framework for binding site prediction can integrate multiple sources of data, and its performance was tested on different datasets. Our results show that integrating information from multiple data sources, such as genomic sequence of genes' promoters, conservation over multiple species, and gene expression data, indeed improves the accuracy of computational predictions.Keywords
This publication has 29 references indexed in Scilit:
- Reverse engineering of regulatory networks in human B cellsNature Genetics, 2005
- A quantitative genomic expression analysis platform for multiplexed in vitro prediction of drug actionThe Pharmacogenomics Journal, 2005
- N-myc Regulates Parkin ExpressionJournal of Biological Chemistry, 2004
- Identification of DNA regulatory motifs using Bayesian variable selectionBioinformatics, 2004
- A Motif Co-Occurrence Approach for Genome-Wide Prediction of Transcription-Factor-Binding Sites inEscherichia coliGenome Research, 2004
- Gene Expression Profiles in the Brain Predict Behavior in Individual Honey BeesScience, 2003
- Genome-wide Co-occurrence of Promoter Elements Reveals a cis-Regulatory Cassette of rRNA Transcription Motifs in Saccharomyces cerevisiaeGenome Research, 2002
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000
- Specificity, free energy and information content in protein–DNA interactionsTrends in Biochemical Sciences, 1998