BEAM: A Beam Search Algorithm for the Identification of Cis-Regulatory Elements in Groups of Genes
- 1 April 2006
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 13 (3) , 686-701
- https://doi.org/10.1089/cmb.2006.13.686
Abstract
The identification of potential protein binding sites (cis-regulatory elements) in the upstream regions of genes is key to understanding the mechanisms that regulate gene expression. To this end, we present a simple, efficient algorithm, BEAM (beam-search enumerative algorithm for motif finding), aimed at the discovery of cis-regulatory elements in the DNA sequences upstream of a related group of genes. This algorithm dramatically limits the search space of expanded sequences, converting the problem from one that is exponential in the length of motifs sought to one that is linear. Unlike sampling algorithms, our algorithm converges and is capable of finding statistically overrepresented motifs with a low failure rate. Further, our algorithm is not dependent on the objective function or the organism used. Limiting the space of candidate motifs enables the algorithm to focus only on those motifs that are most likely to be biologically relevant and enables the algorithm to use direct evaluations of background frequencies instead of resorting to probabilistic estimates. In addition, limiting the space of candidate motifs makes it possible to use computationally expensive objective functions that are able to correctly identify biologically relevant motifs.Keywords
This publication has 23 references indexed in Scilit:
- The Effects of Selection Against Spurious Transcription Factor Binding SitesMolecular Biology and Evolution, 2003
- Separating real motifs from their artifactsBioinformatics, 2001
- Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotationBioinformatics, 2000
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000
- Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.Bioinformatics, 1999
- Predicting Gene Regulatory Elements in Silico on a Genomic ScaleGenome Research, 1998
- Approaches to the Automatic Discovery of Patterns in BiosequencesJournal of Computational Biology, 1998
- Unsupervised learning of multiple motifs in biopolymers using expectation maximizationMachine Learning, 1995
- The Individual Ergodic Theorem of Information TheoryThe Annals of Mathematical Statistics, 1957
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951