An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance

Open Access

7 February 2008

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 9 (1) , 1-13
https://doi.org/10.1186/1471-2105-9-89

Abstract

Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially. We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery. We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets.

Keywords

This publication has 24 references indexed in Scilit:

A genomic code for nucleosome positioning
Nature, 2006
Informative priors based on transcription factor structural class improve de novo motif discovery
Bioinformatics, 2006
An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2006
Occurrence Probability of Structured Motifs in Random Sequences
Journal of Computational Biology, 2002
The RNA polymerase II core promoter: a key component in the regulation of gene expression
Genes & Development, 2002
Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification
Journal of Computational Biology, 2000
Computational identification of Cis -regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 1 1Edited by F. E. Cohen
Journal of Molecular Biology, 2000
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology
Journal of the American Statistical Association, 1999
Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences
Journal of Molecular Biology, 1990
A wide variety of DNA sequences can functionally replace a yeast TATA element for transcriptional activation.
Genes & Development, 1990