A machine learning strategy to identify candidate binding sites in human protein-coding sequence
Open Access
- 26 September 2006
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 7 (1) , 419
- https://doi.org/10.1186/1471-2105-7-419
Abstract
The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements.Keywords
All Related Versions
This publication has 27 references indexed in Scilit:
- Dichotomous splicing signals in exon flanksGenome Research, 2005
- Accurate identification of alternatively spliced exons using support vector machineBioinformatics, 2004
- An Overview of EnsemblGenome Research, 2004
- Sequence Information for the Splicing of Human Pre-mRNA Identified by Support Vector Machine ClassificationGenome Research, 2003
- Widespread Selection for Local RNA Secondary Structure in Coding Regions of Bacterial GenesGenome Research, 2003
- Predictive Identification of Exonic Splicing Enhancers in Human GenesScience, 2002
- Strong RNA Splicing Enhancers Identified by a Modified Method of Cycled Selection Interact with SR ProteinPublished by Elsevier ,2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Assembly of specific SR protein complexes on distinct regulatory elements of the Drosophila doublesex splicing enhancer.Genes & Development, 1996
- Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequencesJournal of Molecular Biology, 1990