A Simple Physical Model Predicts Small Exon Length Variations

Abstract
One of the most common splice variations are small exon length variations caused by the use of alternative donor or acceptor splice sites that are in very close proximity on the pre-mRNA. Among these, three-nucleotide variations at so-called NAGNAG tandem acceptor sites have recently attracted considerable attention, and it has been suggested that these variations are regulated and serve to fine-tune protein forms by the addition or removal of a single amino acid. In this paper we first show that in-frame exon length variations are generally overrepresented and that this overrepresentation can be quantitatively explained by the effect of nonsense-mediated decay. Our analysis allows us to estimate that about 50% of frame-shifted coding transcripts are targeted by nonsense-mediated decay. Second, we show that a simple physical model that assumes that the splicing machinery stochastically binds to nearby splice sites in proportion to the affinities of the sites correctly predicts the relative abundances of different small length variations at both boundaries. Finally, using the same simple physical model, we show that for NAGNAG sites, the difference in affinities of the neighboring sites for the splicing machinery accurately predicts whether splicing will occur only at the first site, splicing will occur only at the second site, or three-nucleotide splice variants are likely to occur. Our analysis thus suggests that small exon length variations are the result of stochastic binding of the spliceosome at neighboring splice sites. Small exon length variations occur when there are nearby alternative splice sites that have similar affinity for the splicing machinery. It has recently become clear that splice variation affects most mammalian genes. It is, however, less clear to what extent these splice variations are functional and regulated by the cell as opposed to simply a result of noise in the splicing process. One of the most frequently observed forms of splice variation are small variations in exon length in which the boundary of an exon is shifted by small amounts between different transcripts. In this work the authors study the statistics of these splice variations in detail, and the results suggest that these variations are mostly the result of noise in the splicing process. In particular, they propose a simple physical model in which the last step of splicing involves the sequence-specific binding of the splicing machinery to the splice site. In this model, small length variations can occur when there are nearby splice sites with comparable affinity for the splicing machinery. The authors show that this model not only accurately predicts the relative abundances of different splice variations but also predicts which splice sites are likely to undergo small exon length variations.