Highly structured sequence homology between an insertion element and the gene in which it resides

Abstract
The recessive allele for soybean seed lectin results from the insertion of a DNA segment (designated Tgm1) into the coding region of the gene. The terminal of Tgm1 display structural features characteristic of a transposable element. The complete sequence of Tgm1 contains 3550 base pairs (bp) and can be divided into 3 regions (left arm, midsection and right arm). No large open reading frames were found, but an extensive, highly structured border with homology to the lectin gene was revealed. The left border (726 bp) comprising most of the left arm and extreme right border (144 bp) of the right arm consist of various forms of a basic 54-bp repeating unit. This 54-bp unit is comprised of a stem-loop structure and interhairpin sequence that occurs 13 times in the left arm and 2 times in the right arm of Tgm1. Progressively degenerate forms of this repeating unit appear toward the termini of Tgm1, but the dyad symmetry remains highly conserved. Seven nucleotides (A-C-A-T-C-G-G and its complement) maintained within the stem also appear as a subset of inverted repeats found at nearly equal distances from the target site in the lectin gene. Together with the inverted repeat termini and a duplication in the left arm, this 7-bp sequence occurs a total of 33 times in Tgm1. The dyad symmetries containing this sequence may be involved in target gene selection. The repeating unit format of Tgm1 describes a distinct class of eukaryotic elements that includes representatives mobile in snapdragon and maize.