A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length
Open Access
- 22 February 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (10) , 2240-2245
- https://doi.org/10.1093/bioinformatics/bti336
Abstract
Motivation: Transcription regulatory protein factors often bind DNA as homo-dimers or hetero-dimers. Thus they recognize structured DNA motifs that are inverted or direct repeats or spaced motif pairs. However, these motifs are often difficult to identify owing to their high divergence. The motif structure included explicitly into the motif recognition algorithm improves recognition efficiency for highly divergent motifs as well as estimation of motif geometric parameters. Result: We present a modification of the Gibbs sampling motif extraction algorithm, SeSiMCMC (Sequence Similarities by Markov Chain Monte Carlo), which finds structured motifs of these types, as well as non-structured motifs, in a set of unaligned DNA sequences. It employs improved estimators of motif and spacer lengths. The probability that a sequence does not contain any motif is accounted for in a rigorous Bayesian manner. We have applied the algorithm to a set of upstream regions of genes from two Escherichia coli regulons involved in respiration. We have demonstrated that accounting for a symmetric motif structure allows the algorithm to identify weak motifs more accurately. In the examples studied, ArcA binding sites were demonstrated to have the structure of a direct spaced repeat, whereas NarP binding sites exhibited the palindromic structure. Availability: The WWW interface of the program, its FreeBSD (4.0) and Windows 32 console executables are available at http://bioinform.genetika.ru/SeSiMCMC Contact:favorov@sensi.org Supplementary information: Supplementary material available at http://bioinform.genetika.ru/SeSiMCMCKeywords
This publication has 29 references indexed in Scilit:
- Transcriptional regulatory code of a eukaryotic genomeNature, 2004
- Cluster-Buster: finding dense clusters of motifs in DNA sequencesNucleic Acids Research, 2003
- Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeastsGenome Biology, 2003
- Computational prediction of transcription-factor binding site locationsGenome Biology, 2003
- Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. CohenJournal of Molecular Biology, 2000
- Comparative analysis of regulatory patterns in bacterial genomesBriefings in Bioinformatics, 2000
- Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.Bioinformatics, 1999
- Differential regulation by the homologous response regulators NarL and NarP of Escherichia coli K‐12 depends on DNA binding site arrangementMolecular Microbiology, 1997
- Expression of the narX, narL, narP, and narQ genes of Escherichia coli K-12: regulation of the regulatorsJournal of Bacteriology, 1995
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987