Predicting Gene Regulatory Elements in Silico on a Genomic Scale
Open Access
- 1 November 1998
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 8 (11) , 1202-1215
- https://doi.org/10.1101/gr.8.11.1202
Abstract
We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences of regular expression-type patterns with the goal of identifying potential regulatory elements. To achieve this goal, we have developed a new sequence pattern discovery algorithm that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences. We applied the algorithm in two cases, (1) discovery of patterns in the complete set of >6000 sequences taken upstream of the putative yeast genes and (2) discovery of patterns in the regions upstream of the genes with similar expression profiles. In the first case, we looked for patterns that occur more frequently in the gene upstream regions than in the genome overall. In the second case, first we clustered the upstream regions of all the genes by similarity of their expression profiles on the basis of publicly available gene expression data and then looked for sequence patterns that are over-represented in each cluster. In both cases we considered each pattern that occurred at least in some minimum number of sequences, and rated them on the basis of their over-representation. Among the highest rating patterns, most have matches to substrings in known yeast transcription factor-binding sites. Moreover, several of them are known to be relevant to the expression of the genes from the respective clusters. Experiments on simulated data show that the majority of the discovered patterns are not expected to occur by chance.Keywords
This publication has 32 references indexed in Scilit:
- Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies 1 1Edited by G. von HeijneJournal of Molecular Biology, 1998
- Distribution of Transcription Factor Binding Sites in the Yeast Genome Suggests Abundance of Coordinately Regulated GenesGenomics, 1998
- Software for the analysis of DNA sequence elements of transcriptionBioinformatics, 1997
- Life with 6000 GenesScience, 1996
- A comparison of imperative and purely functional suffix tree constructionsScience of Computer Programming, 1995
- MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matricesBioinformatics, 1995
- On-line construction of suffix treesAlgorithmica, 1995
- Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragmentsJournal of Molecular Biology, 1992
- Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequencesJournal of Molecular Biology, 1990
- Information content of binding sites on nucleotide sequencesJournal of Molecular Biology, 1986