The Evolutionary Gain of Spliceosomal Introns: Sequence and Phase Preferences

Abstract
Theories regarding the evolution of spliceosomal introns differ in the extent to which the distribution of introns reflects either a formative role in the evolution of protein-coding genes or the adventitious gain of genetic elements. Here, systematic methods are used to assess the causes of the present-day distribution of introns in 10 families of eukaryotic protein-coding genes comprising 1,868 introns in 488 distinct alignment positions. The history of intron evolution inferred using a probabilistic model that allows ancestral inheritance of introns, gain of introns, and loss of introns reveals that the vast majority of introns in these eukaryotic gene families were not inherited from the most recent common ancestral genes, but were gained subsequently. Furthermore, among inferred events of intron gain that meet strict criteria of reliability, the distribution of sites of gain with respect to reading-frame phase shows a 5:3:2 ratio of phases 0, 1 and 2, respectively, and exhibits a nucleotide preference for MAG GT (positions −3 to +2 relative to the site of gain). The nucleotide preferences of intron gain may prove to be the ultimate cause for the phase bias. The phase bias of intron gain is sufficient to account quantitatively for the well-known 5:3:2 bias in phase frequencies among extant introns, a conclusion that holds even when taxonomic heterogeneity in phase patterns is considered. Thus, intron gain accounts for the vast majority of extant introns and for the bias toward phase 0 introns that previously was interpreted as evidence for ancient formative introns.