SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data
Open Access
- 28 March 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (11) , 1410-1417
- https://doi.org/10.1093/bioinformatics/btm115
Abstract
Motivation: Knowing the localization of a protein within the cell helps elucidate its role in biological processes, its function and its potential as a drug target. Thus, subcellular localization prediction is an active research area. Numerous localization prediction systems are described in the literature; some focus on specific localizations or organisms, while others attempt to cover a wide range of localizations. Results: We introduce SherLoc, a new comprehensive system for predicting the localization of eukaryotic proteins. It integrates several types of sequence and text-based features. While applying the widely used support vector machines (SVMs), SherLoc’s main novelty lies in the way in which it selects its text sources and features, and integrates those with sequence-based features. We test SherLoc on previously used datasets, as well as on a new set devised specifically to test its predictive power, and show that SherLoc consistently improves on previous reported results. We also report the results of applying SherLoc to a large set of yet-unlocalized proteins. Availability: SherLoc, along with Supplementary Information, is available at: http://www-bs.informatik.uni-tuebingen.de/Services/SherLoc/ Contact: shatkay@cs.queensu.ca Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 32 references indexed in Scilit:
- MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid compositionBioinformatics, 2006
- Overview of BioCreAtIvE: critical assessment of information extraction for biologyBMC Bioinformatics, 2005
- Mimicking Cellular Sorting Improves Prediction of Subcellular LocalizationJournal of Molecular Biology, 2005
- Localization of Organelle Proteins by Isotope Tagging (LOPIT)Molecular & Cellular Proteomics, 2004
- Advances in the prediction of protein targeting signalsProteomics, 2004
- Meta-clustering of gene expression data and literature-based informationACM SIGKDD Explorations Newsletter, 2003
- Prediction of Human Protein Function from Post-translational Modifications and Localization FeaturesJournal of Molecular Biology, 2002
- GFP imaging: methodology and application to investigate cellular compartmentation in plantsJournal of Experimental Botany, 2001
- Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid SequenceJournal of Molecular Biology, 2000
- ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sitesProtein Science, 1999