Enrichment of oligonucleotide sets with transcription control signals. III: DNA from non-mammalian vertebrates

Abstract
We studied the frequency distribution of 1 048 576 oligonucleotides 10 bp long in a sample of 1.072 × 106 bases of genes from non-mammalian vertebrates, made of 322 sequences extracted from EMBL(R) 29.0, with the aim of detecting transcription control signals. Among all decamers, 2097 (0.2%) had a frequency 10 times higher than the mean and were subjected to further statistical analysis. For each of the 2097 decamers (parents), we counted the individual frequencies of the 30 decamers differing from the parent by one base mutation (progeny) and we calculated two variance/mean chi squares for the progeny, with and without the parent decamer. By studying the distribution of the ratio between the two chi squares we observed that out of 2097 decamers that occurred >10 times more frequently than average, 1017 had a chi square ratio of between 1 and 1.5; in this final set, which corresponds to <0.097% of all possible decamers, 75 decamers were found to contain 100 transcription control elements, like CCAA Tand others. The final set contains a high excess of signals when compared to 100 random sets of 1017 decamers. Some of the decamers selected with the procedure are members of consensu.s sequences rather than unique sequences.

This publication has 0 references indexed in Scilit: