Large-Scale Predictions of Gram-Negative Bacterial Protein Subcellular Locations
- 28 October 2006
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 5 (12) , 3420-3428
- https://doi.org/10.1021/pr060404b
Abstract
Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an automated method for fast and reliabe prediction of Gram-negative protein subcellular location will allow us to not only timely annotate gene products, but also screen candidates for drug discovery. However, protein subcellular location prediction is a very difficult problem, particularly when more location sites need to be involved and when unknown query proteins do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high degrees of sequence identity in a same location group and, hence, may bear a strong homology bias. To overcome these problems, a new predictor, called “Gneg-PLoc”, is developed. Featured by fusing many basic classifiers each being trained with a stringent data set containing proteins with strictly less than 25% sequence identity to one another in a same location group, the new predictor can cover eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane, nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working in the relevant areas, a downloadable file is provided at the same Web site to list the results identified by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no subcellular location annotations or are annotated with uncertain terms. The large-scale results will be updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new development of Gneg-PLoc. Keywords: Gram-negative • Subcellular compartment • Gene ontology • Amphiphilic pseudo amino acid composition • Fusion • K-nearest neighbor ruleKeywords
This publication has 22 references indexed in Scilit:
- Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature FusionAmino Acids, 2006
- Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factorJournal of Computational Chemistry, 2006
- Searching for hypothetical proteins: Theory and practice based upon original data and literatureProgress in Neurobiology, 2005
- Validation of qualitative models of genetic regulatory networks by model checking: analysis of the nutritional stress response in Escherichia coliBioinformatics, 2005
- SLLE for predicting membrane protein typesJournal of Theoretical Biology, 2005
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Relation between amino acid composition and cellular location of proteinsJournal of Molecular Biology, 1997
- Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-pair FrequenciesJournal of Molecular Biology, 1994
- Prediction of protein structural class by discriminant analysisBiochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 1986