Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals
- 15 February 2000
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 28 (4) , 1000-1010
- https://doi.org/10.1093/nar/28.4.1000
Abstract
The study of a few genes has permitted the identification of three elements that constitute a yeast polyadenyl-ation signal: the efficiency element (EE), the positioning element and the actual site for cleavage and poly-adenyl-ation. In this paper we perform an analysis of oligonucleotide composition on the sequences located downstream of the stop codon of all yeast genes. Several oligonucleotide families appear over-represented with a high significance (referred to herein as 'words'). The family with the highest over-representation includes the oligonucleotides shown experimentally to play a role as EEs. The word with the highest score is TATATA, followed, among others, by a series of single-nucleotide variants (TATGTA, TACATA, TAAATA.) and one-letter shifts (ATATAT). A position analysis reveals that those words have a high preference to be in 3' flanks of yeast genes and there they have a very uneven distribution, with a marked peak around 35 bp after the stop codon. Of the predicted ORFs, 85% show one or more of those sequences. Similar results were obtained using a data set of EST sequences. Other clusters of over-represented words are also detected, namely T- and A-rich signals. Using these results and previously known data we propose a general model for the 3' trailers of yeast mRNAs.Keywords
This publication has 32 references indexed in Scilit:
- A web site for the computational analysis of yeast regulatory sequencesYeast, 2000
- Analysis of the Structure of a Natural Alternating d(TA)n Sequence in Yeast ChromatinYeast, 1997
- Over- and under-representation of short oligonucleotides in DNA sequences.Proceedings of the National Academy of Sciences, 1992
- Statistical evaluation and biological interpretation of non-random abundance in theE.coliK-12 genome of tetra-and pentanucleotide sequences related to VSP DNA mismatch repairNucleic Acids Research, 1992
- Statistical analysis of nucleotide sequencesNucleic Acids Research, 1990
- Mutational analysis of a yeast transcriptional terminator.Proceedings of the National Academy of Sciences, 1989
- Linguistics of Nucleotide Sequences I: The Significance of Deviations from Mean Statistical Characteristics and Prediction of the Frequencies of Occurrence of WordsJournal of Biomolecular Structure and Dynamics, 1989
- RNA Processing Generates the Mature 3′ End of Yeast CYC1 Messenger RNA in VitroScience, 1988
- The effect of codon usage on the oligonucleotide composition of the E.coli genome and identification of over-and underepresented sequences by Markow chain analysisNucleic Acids Research, 1987
- Mono-through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysisNucleic Acids Research, 1987