Predicting Protein Cellular Localization Using a Domain Projection Method
Open Access
- 1 August 2002
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (8) , 1168-1174
- https://doi.org/10.1101/gr.96802
Abstract
We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a “small-world network”, linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of “bridging” domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment (“locale”), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy.Keywords
This publication has 31 references indexed in Scilit:
- Large-scale identification of mammalian proteins localized to nuclear sub-compartmentsHuman Molecular Genetics, 2001
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. CohenJournal of Molecular Biology, 2001
- A Global Geometric Framework for Nonlinear Dimensionality ReductionScience, 2000
- A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome 1 1Edited by F. CohenJournal of Molecular Biology, 2000
- SMART: a web-based tool for the study of genetically mobile domainsNucleic Acids Research, 2000
- Nucleocytoplasmic TransportScience, 1996
- Construction of Genetic Maps Using Distance GeometryGenomics, 1995
- A note on two problems in connexion with graphsNumerische Mathematik, 1959