Predicting protein subcellular location by fusing multiple classifiers
- 25 April 2006
- journal article
- research article
- Published by Wiley in Journal of Cellular Biochemistry
- Vol. 99 (2) , 517-527
- https://doi.org/10.1002/jcb.20879
Abstract
One of the fundamental goals in cell biology and proteomics is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. Knowledge of subcellular locations of proteins can provide key hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both time-consuming and expensive to determine the localization of an uncharacterized protein in a living cell purely based on experiments. With the avalanche of newly found protein sequences emerging in the post genomic era, we are facing a critical challenge, that is, how to develop an automated method to fast and reliably identify their subcellular locations so as to be able to timely use them for basic research and drug discovery. In view of this, an ensemble classifier was developed by the approach of fusing many basic individual classifiers through a voting system. Each of these basic classifiers was trained in a different dimension of the amphiphilic pseudo amino acid composition (Chou [2005] Bioinformatics 21: 10–19). As a demonstration, predictions were performed with the fusion classifier for proteins among the following 14 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. The overall success rates thus obtained via the resubstitution test, jackknife test, and independent dataset test were all significantly higher than those by the existing classifiers. It is anticipated that the novel ensemble classifier may also become a very useful vehicle in classifying other attributes of proteins according to their sequences, such as membrane protein type, enzyme family/sub-family, G-protein coupled receptor (GPCR) type, and structural class, among many others. The fusion ensemble classifier will be available at www.pami.sjtu.edu.cn/people/hbshen. J. Cell. Biochem. 99: 517–527, 2006.Keywords
This publication has 60 references indexed in Scilit:
- Predicting membrane protein type by functional domain composition and pseudo-amino acid compositionJournal of Theoretical Biology, 2006
- Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid compositionJournal of Theoretical Biology, 2005
- Predicting Enzyme Subclass by Functional Domain Composition and Pseudo Amino Acid CompositionJournal of Proteome Research, 2005
- Predicting 22 protein localizations in budding yeastBiochemical and Biophysical Research Communications, 2004
- Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid compositionBiochemical and Biophysical Research Communications, 2003
- Is it a paradox or misinterpretation?Proteins-Structure Function and Bioinformatics, 2001
- Using subsite coupling to predict signal peptidesProtein Engineering, Design and Selection, 2001
- Relation between amino acid composition and cellular location of proteinsJournal of Molecular Biology, 1997
- The SWISS-PROT protein sequence data bank and its supplement TrEMBLNucleic Acids Research, 1997
- A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition spaceProteins-Structure Function and Bioinformatics, 1995