POIMs: positional oligomer importance matrices—understanding support vector machine-based signal detectors
Open Access
- 1 July 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (13) , i6-i14
- https://doi.org/10.1093/bioinformatics/btn170
Abstract
Motivation: At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences. Frequently the most accurate classifiers are obtained by training support vector machines (SVMs) with complex sequence kernels. However, a cumbersome shortcoming of SVMs is that their learned decision rules are very hard to understand for humans and cannot easily be related to biological facts. Results: To make SVM-based sequence classifiers more accessible and profitable, we introduce the concept of positional oligomer importance matrices (POIMs) and propose an efficient algorithm for their computation. In contrast to the raw SVM feature weighting, POIMs take the underlying correlation structure of k-mer features induced by overlaps of related k-mers into account. POIMs can be seen as a powerful generalization of sequence logos: they allow to capture and visualize sequence patterns that are relevant for the investigated biological phenomena. Availability: All source code, datasets, tables and figures are available at http://www.fml.tuebingen.mpg.de/raetsch/projects/POIM. Contact: Soeren.Sonnenburg@first.fraunhofer.de Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 24 references indexed in Scilit:
- Accurate splice site prediction using support vector machinesBMC Bioinformatics, 2007
- C. eleganssequences that controltrans-splicing and operon pre-mRNA processingRNA, 2007
- Visualisation and interpretation of Support Vector Regression modelsAnalytica Chimica Acta, 2007
- Identification of core promoter modules in Drosophila and their application in accurate transcription start site predictionNucleic Acids Research, 2006
- KISS: The kinetoplastid RNA editing sequence search toolRNA, 2006
- ARTS: accurate recognition of transcription starts in humanBioinformatics, 2006
- Learning Interpretable SVMs for Biological Sequence ClassificationBMC Bioinformatics, 2006
- Classification of Faces in Man and MachineNeural Computation, 2006
- RASE: recognition of alternatively spliced exons in C.elegansBioinformatics, 2005
- Computational Detection and Location of Transcription Start Sites in Mammalian Genomic DNAGenome Research, 2002