ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes
Open Access
- 1 May 2007
- journal article
- research article
- Published by Springer Nature in Genome Biology
- Vol. 8 (5) , R68
- https://doi.org/10.1186/gb-2007-8-5-r68
Abstract
We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.Keywords
This publication has 31 references indexed in Scilit:
- Organellar proteomics: turning inventories into insightsEMBO Reports, 2006
- Prediction of protein subcellular localizationProteins-Structure Function and Bioinformatics, 2006
- MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid compositionBioinformatics, 2006
- TARGET: a new method for predicting protein subcellular localization in eukaryotesBioinformatics, 2005
- Protein classification based on text document classification techniquesProteins-Structure Function and Bioinformatics, 2005
- Predicting Subcellular Localization via Protein Motif Co-OccurrenceGenome Research, 2004
- How independent are the appearances of n-mers in different genomes?Bioinformatics, 2004
- MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteinsBioinformatics, 2004
- Better prediction of sub‐cellular localization by combining evolutionary and structural informationProteins-Structure Function and Bioinformatics, 2003
- Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid SequenceJournal of Molecular Biology, 2000