Accurate splice site prediction using support vector machines
Open Access
- 21 December 2007
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (S10) , 1-16
- https://doi.org/10.1186/1471-2105-8-s10-s7
Abstract
For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder. Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice .Keywords
This publication has 44 references indexed in Scilit:
- POIMs: positional oligomer importance matrices—understanding support vector machine-based signal detectorsBioinformatics, 2008
- Learning Interpretable SVMs for Biological Sequence ClassificationBMC Bioinformatics, 2006
- A haplotype map of the human genomeNature, 2005
- RASE: recognition of alternatively spliced exons in C.elegansBioinformatics, 2005
- BLAT—The BLAST-Like Alignment ToolGenome Research, 2002
- Improved microbial gene identification with GLIMMERNucleic Acids Research, 1999
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- Improved Splice Site Detection in GenieJournal of Computational Biology, 1997
- dbEST — database for “expressed sequence tags”Nature Genetics, 1993
- Basic principles of ROC analysisSeminars in Nuclear Medicine, 1978