L1 family of repetitive DNA sequences in primates may be derived from a sequence encoding a reverse transcriptase-related protein

Abstract
Primate and rodent genomes contain a family of highly repetitive, long interspersed sequences, designated the L1 family or LINE-1 (refs 1–4). Characteristic features of the L1 family sequences such as an A-rich stretch at the 3′ end, a truncated 5′ end, the existence of significantly long open reading frames (ORFs)5–9 and the presence of L1 family transcripts in various types of cells, including pluripotential embryonic cells10–15, suggest that the L1 family is derived from a sequence encoding a protein(s) and dispersed in the genome through an RNA-mediated process. These features of the L1 family are believed to be due to reverse transcription beginning at the 3′ end of the L1 transcript and terminating prematurely and to the site duplication caused by the insertion of the complementary DNA (reviewed in refs 3, 4). It is likely that this type of transcript is converted to cDNA and inserted into the chromosome through a process similar to that of the formation of processed pseudogenes16. The above model, however, does not necessarily explain why the L1 family should produce the extraordinarily large number of copies (more than 104 per haploid genome17) seen during evolution. It seems likely that the progenitor of the L1 family itself carries (or carried) a function which promotes the active dispersion of the L1 family sequence. We reasoned that such a function, if present, must be conserved during evolution and may be shown by comparative analysis of L1 family sequences from evolutionary distant species. We show here that the L1 family sequence contains an ORF possessing significant sequence homology to several RNA-dependent DNA polymerases of viral and transposable element origins. This provides a plausible explanation for the preferential and active dispersion of the L1 family sequence during evolution.