A novel ensemble learning method for de novo computational identification of DNA binding sites
Open Access
- 12 July 2007
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1) , 1-15
- https://doi.org/10.1186/1471-2105-8-249
Abstract
Despite the diversity of motif representations and search algorithms, the de novo computational identification of transcription factor binding sites remains constrained by the limited accuracy of existing algorithms and the need for user-specified input parameters that describe the motif being sought. We present a novel ensemble learning method, SCOPE, that is based on the assumption that transcription factor binding sites belong to one of three broad classes of motifs: non-degenerate, degenerate and gapped motifs. SCOPE employs a unified scoring metric to combine the results from three motif finding algorithms each aimed at the discovery of one of these classes of motifs. We found that SCOPE's performance on 78 experimentally characterized regulons from four species was a substantial and statistically significant improvement over that of its component algorithms. SCOPE outperformed a broad range of existing motif discovery algorithms on the same dataset by a statistically significant margin. SCOPE demonstrates that combining multiple, focused motif discovery algorithms can provide a significant gain in performance. By building on components that efficiently search for motifs without user-defined parameters, SCOPE requires as input only a set of upstream sequences and a species designation, making it a practical choice for non-expert users. A user-friendly web interface, Java source code and executables are available at http://genie.dartmouth.edu/scope .Keywords
This publication has 39 references indexed in Scilit:
- SPACER: identification ofcis-regulatory elements with non-contiguous critical residuesBioinformatics, 2007
- Computational identification of transcriptional regulatory elements in DNA sequenceNucleic Acids Research, 2006
- Practical Strategies for Discovering Regulatory DNA Sequence MotifsPLoS Computational Biology, 2006
- Assessing computational tools for the discovery of transcription factor binding sitesNature Biotechnology, 2005
- Applied bioinformatics for the identification of regulatory elementsNature Reviews Genetics, 2004
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- Finding Motifs Using Random ProjectionsJournal of Computational Biology, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998