Widespread Endogenization of Genome Sequences of Non-Retroviral RNA Viruses into Plant Genomes

Abstract
Non-retroviral RNA virus sequences (NRVSs) have been found in the chromosomes of vertebrates and fungi, but not plants. Here we report similarly endogenized NRVSs derived from plus-, negative-, and double-stranded RNA viruses in plant chromosomes. These sequences were found by searching public genomic sequence databases, and, importantly, most NRVSs were subsequently detected by direct molecular analyses of plant DNAs. The most widespread NRVSs were related to the coat protein (CP) genes of the family Partitiviridae which have bisegmented dsRNA genomes, and included plant- and fungus-infecting members. The CP of a novel fungal virus (Rosellinia necatrix partitivirus 2, RnPV2) had the greatest sequence similarity to Arabidopsis thaliana ILR2, which is thought to regulate the activities of the phytohormone auxin, indole-3-acetic acid (IAA). Furthermore, partitivirus CP-like sequences much more closely related to plant partitiviruses than to RnPV2 were identified in a wide range of plant species. In addition, the nucleocapsid protein genes of cytorhabdoviruses and varicosaviruses were found in species of over 9 plant families, including Brassicaceae and Solanaceae. A replicase-like sequence of a betaflexivirus was identified in the cucumber genome. The pattern of occurrence of NRVSs and the phylogenetic analyses of NRVSs and related viruses indicate that multiple independent integrations into many plant lineages may have occurred. For example, one of the NRVSs was retained in Ar. thaliana but not in Ar. lyrata or other related Camelina species, whereas another NRVS displayed the reverse pattern. Our study has shown that single- and double-stranded RNA viral sequences are widespread in plant genomes, and shows the potential of genome integrated NRVSs to contribute to resolve unclear phylogenetic relationships of plant species. Eukaryotic genomes contain sequences that have originated from DNA viruses and reverse-transcribing viruses, i.e., retroviruses, pararetroviruses (DNA viruses), and transposons. However, the sequences of non-retroviral RNA viruses, which are unable to convert their genomes to DNA, were until recently considered not to be integrated into eukaryotic nuclear genomes. We present evidence for multiple independent events of horizontal gene transfer from a wide range of RNA viruses, including plus-sense, minus-sense, and double-stranded RNA viruses, into the genomes of distantly related plant lineages. Some non-retroviral integrated RNA viral sequences are conserved across genera within a plant family, whereas others are retained only in a limited number of species in a genus. Integration profiles of non-retroviral integrated RNA viral sequences demonstrate the potential of these sequences to serve as powerful molecular tools for deciphering phylogenetic relationships among related plants. Moreover, this study highlights plants co-opting non-retroviral RNA virus sequences, and provides insights into plant genome evolution and interplay between non-reverse-transcribing RNA viruses and their hosts.