Euk-mPLoc: A Fusion Classifier for Large-Scale Eukaryotic Protein Subcellular Location Prediction by Incorporating Multiple Sites
- 31 March 2007
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 6 (5) , 1728-1734
- https://doi.org/10.1021/pr060635i
Abstract
One of the critical challenges in predicting protein subcellular localization is how to deal with the case of multiple location sites. Unfortunately, so far, no efforts have been made in this regard except for the one focused on the proteins in budding yeast only. For most existing predictors, the multiple-site proteins are either excluded from consideration or assumed even not existing. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. For instance, according to the Swiss-Prot database (version 50.7, released 19-Sept-2006), among the 33 925 eukaryotic protein entries that have experimentally observed subcellular location annotations, 2715 have multiple location sites, meaning about 8% bearing the multiplex feature. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. Meanwhile, according to the same Swiss-Prot database, the number of total eukaryotic protein entries (except those annotated with “fragment” or those with less than 50 amino acids) is 90 909, meaning a gap of (90 909−33 925) = 56 984 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the blank, so far, all the existing methods for predicting eukaryotic protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Euk-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Euk-mPLoc is freely accessible to the public as a Web server at http://202.120.37.186/bioinf/euk-multi. Meanwhile, to support the people working in the relevant areas, Euk-mPLoc has been used to identify all eukaryotic protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited at the same Web site via a downloadable file prepared with Microsoft Excel and named “Tab_Euk-mPLoc.xls”. Furthermore, to include new entries of eukaryotic proteins and reflect the continuous development of Euk-mPLoc in both the coverage scope and prediction accuracy, we will timely update the downloadable file as well as the predictor, and keep users informed by publishing a short note in the Journal and making an announcement in the Web Page. Keywords: Large-scale prediction • Eukaryotic protein • Multiple locations • Ensemble classifier • Fusion • Optimal threshold • Euk-mPLocKeywords
This publication has 11 references indexed in Scilit:
- Using pseudo-amino acid composition and support vector machine to predict protein structural classJournal of Theoretical Biology, 2006
- Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequenceBMC Bioinformatics, 2006
- Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classificationJournal of Theoretical Biology, 2006
- Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion networkAnalytical Biochemistry, 2006
- Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature FusionAmino Acids, 2006
- Prediction of protein structural class with Rough SetsBMC Bioinformatics, 2006
- Searching for hypothetical proteins: Theory and practice based upon original data and literatureProgress in Neurobiology, 2005
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- Relation between amino acid composition and cellular location of proteinsJournal of Molecular Biology, 1997
- Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-pair FrequenciesJournal of Molecular Biology, 1994