Abstract
It is known that the GT doublet is well conserved at the 5′ exon/intron splice junction and is frequently embedded in the AGGT quartet. Although only the underlined G is invariable, splicing and ligation are accurately executed. In this work we search for additional conserved potential signals which may aid in 5′ splice site recognition. Extensive searches which are not limited to a preconceived consensus sequence are carried out. We investigate the distributions of the 256 quartets in a 1000 nucleotide span around the 5′ splice sites in ~1700 eukaryotic nuclear precursor mRNAs. Several potential signals are noted. Of particular interest are quartets containing runs of G, e.g., G4, G3T, G3C, G3A and AG3 in the intron immediately downstream and some C-containing quartets in the exon upstream of the 5′ splice site. In an analogous calculation, (A)GGG(A) has also been found to be frequent in the intron, 60 nucleotides upstream and (A)CCC(A) in the exon downstream of the 3′ splice site. These results are consistent with the recent indications that exon sequences may play a role in efficient splicing. Some models are proposed.