LOGOS: A MODULAR BAYESIAN MODEL FOR DE NOVO MOTIF DETECTION
- 1 March 2004
- journal article
- research article
- Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology
- Vol. 02 (01) , 127-154
- https://doi.org/10.1142/s0219720004000508
Abstract
The complexity of the global organization and internal structure of motifs in higher eukaryotic organisms raises significant challenges for motif detection techniques. To achieve successful de novo motif detection, it is necessary to model the complex dependencies within and among motifs and to incorporate biological prior knowledge. In this paper, we present LOGOS, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences, which provides a principled framework for developing, modularizing, extending and computing expressive motif models for complex biopolymer sequence analysis. LOGOS consists of two interacting submodels: HMDM, a local alignment model capturing biological prior knowledge and positional dependency within the motif local structure; and HMM, a global motif distribution model modeling frequencies and dependencies of motif occurrences. Model parameters can be fit using training motifs within an empirical Bayesian framework. A variational EM algorithm is developed for de novo motif detection. LOGOS improves over existing models that ignore biological priors and dependencies in motif structures and motif occurrences, and demonstrates superior performance on both semi-realistic test data and cis-regulatory sequences from yeast and Drosophila genomes with regard to sensitivity, specificity, flexibility and extensibility.Keywords
This publication has 12 references indexed in Scilit:
- An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experimentsNature Biotechnology, 2002
- Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genomeProceedings of the National Academy of Sciences, 2002
- Detection of cis -element clusters in higher eukaryotic DNABioinformatics, 2001
- Identifying target sites for cooperatively binding factorsBioinformatics, 2001
- Regulatory element detection using correlation with expressionNature Genetics, 2001
- Discovering regulatory elements in non-coding sequences by analysis of spaced dyadsNucleic Acids Research, 2000
- Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.Bioinformatics, 1999
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- Empirical Bayes Methods for Combining LikelihoodsJournal of the American Statistical Association, 1996
- Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling StrategiesJournal of the American Statistical Association, 1995