Identification of degenerate motifs using position restricted selection and hybrid ranking combination
Open Access
- 27 November 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 34 (22) , 6379-6391
- https://doi.org/10.1093/nar/gkl658
Abstract
The identification of regulatory elements recognized by transcription factors and chromatin remodeling factors is essential to studying the regulation of gene expression. When no auxiliary data, such as orthologous sequences or expression profiles, are used, the accuracy of most tools for motif discovery is strongly influenced by the motif degeneracy and the lengths of sequence. Since suitable auxiliary data may not always be available, more work must be conducted to enhance tool performance to identify transcription elements in the metazoan. A non-alignment-based algorithm, MotifSeeker, is proposed to enhance the accuracy of discovering degenerate motifs. MotifSeeker utilizes the property that variable sites of transcription elements are usually position-specific to reduce exposure to noise. Consequently, the efficiency and accuracy of motif identification are improved. Using data fusion, the ranking process integrates two measures of motif significance, resulting in a more robust significance measure. Testing results for the synthetic data reveal that the accuracy of MotifSeeker is less sensitive to the motif degeneracy and the length of input sequences. Furthermore, MotifSeeker has been tested on a well-known benchmark [M. Tompa, N. Li, T.L. Bailey, G.M. Church, B. De Moor, E. Eskin, A.V. Favorov, M.C. Frith, Y. Fu, W.J. Kent, et al. (2005) Nat. Biotechnol., 23, 137-144], yielding a correlation coefficient of 0.262, which compares favorably with those of other tools. The high applicability of MotifSeeker to biological data is further demonstrated experimentally on regulons of Saccharomyces cerevisiae and liver-specific genes with experimentally verified regulatory elements.Keywords
This publication has 28 references indexed in Scilit:
- Identifying the conserved network of cis-regulatory sites of a eukaryotic genomeProceedings of the National Academy of Sciences, 2005
- Discovery of regulatory elements in vertebrates through comparative genomicsNature Biotechnology, 2005
- CONREAL web server: identification and visualization of conserved transcription factor binding sitesNucleic Acids Research, 2005
- oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genesNucleic Acids Research, 2005
- Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammalsNature, 2005
- Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approachGenome Biology, 2005
- A mutation in a functional Sp1 binding site of the telomerase RNA gene (hTERC) promoter in a patient with Paroxysmal Nocturnal HaemoglobinuriaBMC Hematology, 2004
- Gibbs Recursive Sampler: finding transcription factor binding sitesNucleic Acids Research, 2003
- Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.Bioinformatics, 1999
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993