Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine Classification
Open Access
- 1 December 2003
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (12) , 2637-2650
- https://doi.org/10.1101/gr.1679003
Abstract
Vertebrate pre-mRNA transcripts contain many sequences that resemble splice sites on the basis of agreement to the consensus, yet these more numerous false splice sites are usually completely ignored by the cellular splicing machinery. Even at the level of exon definition, pseudo exons defined by such false splices sites outnumber real exons by an order of magnitude. We used a support vector machine to discover sequence information that could be used to distinguish real exons from pseudo exons. This machine learning tool led to the definition of potential branch points, an extended polypyrimidine tract, and C-rich and TG-rich motifs in a region limited to 50 nt upstream of constitutively spliced exons. C-rich sequences were also found in a region extending to 80 nt downstream of exons, along with G-triplet motifs. In addition, it was shown that combinations of three bases within the splice donor consensus sequence were more effective than consensus values in distinguishing real from pseudo splice sites; two-way base combinations were optimal for distinguishing 3′ splice sites. These data also suggest that interactions between two or more of these elements may contribute to exon recognition, and provide candidate sequences for assessment as intronic splicing enhancers.Keywords
This publication has 51 references indexed in Scilit:
- Alternative splicing: increasing diversity in the proteomic worldTrends in Genetics, 2001
- An Intronic Splicing Enhancer Binds U1 snRNPs To Enhance Splicing and Select 5′ Splice SitesMolecular and Cellular Biology, 2000
- Multiple Splicing Defects in an Intronic False ExonMolecular and Cellular Biology, 2000
- Human Genomic Sequences That Inhibit SplicingMolecular and Cellular Biology, 2000
- Engineering support vector machine kernels that recognize translation initiation sitesBioinformatics, 2000
- A 5′ Splice Site-Proximal Enhancer Binds SF1 and Activates Exon Bridging of a MicroexonMolecular and Cellular Biology, 2000
- Mechanisms of fidelity in pre-mRNA splicingCurrent Opinion in Cell Biology, 2000
- Characterization of human RNA splice signals by iterative functional selection of splice sitesRNA, 2000
- EID: the Exon-Intron Database--an exhaustive database of protein-coding intron-containing genesNucleic Acids Research, 2000
- Mutations that alter RNA splicing of the human HPRT gene: a review of the spectrumMutation Research - Fundamental and Molecular Mechanisms of Mutagenesis, 1998