Bounded search for de novo identification of degenerate cis-regulatory elements
Open Access
- 15 May 2006
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (1) , 254-17
- https://doi.org/10.1186/1471-2105-7-254
Abstract
Background: The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach. Results: In this paper, we report PRISM, a degenerate motif finder that leverages the relationship between the statistical significance of a set of binding sites and that of the individual binding sites. PRISM first identifies overrepresented, non-degenerate consensus motifs, then iteratively relaxes each one into a high-scoring degenerate motif. This approach requires no tunable parameters, thereby lending itself to unbiased performance comparisons. We therefore compare PRISM's performance against nine popular motif finders on 28 well-characterized S. cerevisiae regulons. PRISM consistently outperforms all other programs. Finally, we use PRISM to predict the binding sites of uncharacterized regulons. Our results support a proposed mechanism of action for the yeast cell-cycle transcription factor Stb1, whose binding site has not been determined experimentally. Conclusion: The relationship between statistical measures of the binding sites and the set as a whole leads to a simple means of identifying the diverse range of cis-regulatory elements to which a protein binds. This approach leverages the advantages of word-counting, in that position dependencies are implicitly accounted for and local optima are more easily avoided. While we sacrifice guaranteed optimality to prevent the exponential blowup of exhaustive search, we prove that the error is bounded and experimentally show that the performance is superior to other methods. A Java implementation of this algorithm can be downloaded from our web server at http://genie.dartmouth.edu/prism.Keywords
This publication has 39 references indexed in Scilit:
- BEAM: A Beam Search Algorithm for the Identification of Cis-Regulatory Elements in Groups of GenesJournal of Computational Biology, 2006
- Assessing computational tools for the discovery of transcription factor binding sitesNature Biotechnology, 2005
- WebLogo: A Sequence Logo Generator: Figure 1Genome Research, 2004
- Applied bioinformatics for the identification of regulatory elementsNature Reviews Genetics, 2004
- Transcriptional Regulatory Networks in Saccharomyces cerevisiaeScience, 2002
- Is there a code for protein–DNA recognition? Probab(ilistical)ly…BioEssays, 2002
- Finding Motifs Using Random ProjectionsJournal of Computational Biology, 2002
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998
- DNA recognition code of transcription factorsProtein Engineering, Design and Selection, 1995