Bounded search for de novo identification of degenerate cis-regulatory elements

Open Access

15 May 2006

journal article
Published by Springer Nature in BMC Bioinformatics

Vol. 7 (1) , 254-17
https://doi.org/10.1186/1471-2105-7-254

Abstract

Background: The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach. Results: In this paper, we report PRISM, a degenerate motif finder that leverages the relationship between the statistical significance of a set of binding sites and that of the individual binding sites. PRISM first identifies overrepresented, non-degenerate consensus motifs, then iteratively relaxes each one into a high-scoring degenerate motif. This approach requires no tunable parameters, thereby lending itself to unbiased performance comparisons. We therefore compare PRISM's performance against nine popular motif finders on 28 well-characterized S. cerevisiae regulons. PRISM consistently outperforms all other programs. Finally, we use PRISM to predict the binding sites of uncharacterized regulons. Our results support a proposed mechanism of action for the yeast cell-cycle transcription factor Stb1, whose binding site has not been determined experimentally. Conclusion: The relationship between statistical measures of the binding sites and the set as a whole leads to a simple means of identifying the diverse range of cis-regulatory elements to which a protein binds. This approach leverages the advantages of word-counting, in that position dependencies are implicitly accounted for and local optima are more easily avoided. While we sacrifice guaranteed optimality to prevent the exponential blowup of exhaustive search, we prove that the error is bounded and experimentally show that the performance is superior to other methods. A Java implementation of this algorithm can be downloaded from our web server at http://genie.dartmouth.edu/prism.

Keywords

This publication has 39 references indexed in Scilit:

BEAM: A Beam Search Algorithm for the Identification of Cis-Regulatory Elements in Groups of Genes
Journal of Computational Biology, 2006
Assessing computational tools for the discovery of transcription factor binding sites
Nature Biotechnology, 2005
WebLogo: A Sequence Logo Generator: Figure 1
Genome Research, 2004
Applied bioinformatics for the identification of regulatory elements
Nature Reviews Genetics, 2004
Transcriptional Regulatory Networks in Saccharomyces cerevisiae
Science, 2002
Is there a code for protein–DNA recognition? Probab(ilistical)ly…
BioEssays, 2002
Finding Motifs Using Random Projections
Journal of Computational Biology, 2002
Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. Cohen
Journal of Molecular Biology, 2000
Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von Heijne
Journal of Molecular Biology, 1998
DNA recognition code of transcription factors
Protein Engineering, Design and Selection, 1995