Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA
Open Access
- 10 October 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (23) , 2858-2864
- https://doi.org/10.1093/bioinformatics/btl499
Abstract
Motivation: Predicting cis-regulatory modules (CRMs) in higher eukaryotes is a challenging computational task. Commonly used methods to predict CRMs based on the signal of transcription factor binding sites (TFBS) are limited by prior information about transcription factor specificity. More general methods that bypass the reliance on TFBS models are needed for comprehensive CRM prediction. Results: We have developed a method to predict CRMs called CisPlusFinder that identifies high density regions of perfect local ungapped sequences (PLUSs) based on multiple species conservation. By assuming that PLUSs contain core TFBS motifs that are locally overrepresented, the method attempts to capture the expected features of CRM structure and evolution. Applied to a benchmark dataset of CRMs involved in early Drosophila development, CisPlusFinder predicts more annotated CRMs than all other methods tested. Using the REDfly database, we find that some ‘false positive’ predictions in the benchmark dataset correspond to recently annotated CRMs. Our work demonstrates that CRM prediction methods that combine comparative genomic data with statistical properties of DNA may achieve reasonable performance when applied genome-wide in the absence of an a priori set of known TFBS motifs. Availability: The program CisPlusFinder can be downloaded at . All software is licensed under the Lesser GNU Public License (LGPL). Contact:nora.pierstorff@uni-koeln.de. Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 28 references indexed in Scilit:
- Using hexamers to predict cis-regulatory motifs in DrosophilaBMC Bioinformatics, 2005
- JIGSAW: integration of multiple sources of evidence for gene predictionBioinformatics, 2005
- Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogasterBioinformatics, 2004
- Conservation of regulatory elements between two species of DrosophilaBMC Bioinformatics, 2003
- Searching for statistically significant regulatory modulesBioinformatics, 2003
- Cluster-Buster: finding dense clusters of motifs in DNA sequencesNucleic Acids Research, 2003
- Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human GenomeScience, 2003
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Evidence for Functional Binding and Stable Sliding of the TATA Binding Protein on Nonspecific DNAJournal of Biological Chemistry, 1995
- Basic local alignment search toolJournal of Molecular Biology, 1990