Better prediction of sub‐cellular localization by combining evolutionary and structural information
- 22 October 2003
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 53 (4) , 917-930
- https://doi.org/10.1002/prot.10507
Abstract
The native sub‐cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code‐like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the most accurate means of determining localization in silico. However, only few motifs are currently known, and not all the trafficking appears regulated in this way. The amino acid composition of a protein correlates with its localization. All general prediction methods employed this observation. Here, we explored the evolutionary information contained in multiple alignments and aspects of protein structure to predict localization in absence of homology and targeting motifs. Our final system combined statistical rules and a variety of neural networks to achieve an overall four‐state accuracy above 65%, a significant improvement over systems using only composition. The system was at its best for extra‐cellular and nuclear proteins; it was significantly less accurate than TargetP for mitochondrial proteins. Interestingly, all methods that were developed on SWISS‐PROT sequences failed grossly when fed with sequences from proteins of known structures taken from PDB. We therefore developed two separate systems: one for proteins of known structure and one for proteins of unknown structure. Finally, we applied the PDB‐based system along with homology‐based inferences and automatic text analysis to annotate all eukaryotic proteins in the PDB (http://cubic.bioc.columbia.edu/db/LOC3D). We imagine that this pilot method—certainly in combination with similar tools—may be valuable target selection in structural genomics. Proteins 2003;53:000–000.Keywords
This publication has 88 references indexed in Scilit:
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- Prediction of protein cellular attributes using pseudo‐amino acid compositionProteins-Structure Function and Bioinformatics, 2001
- Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid SequenceJournal of Molecular Biology, 2000
- The Protein Data BankNucleic Acids Research, 2000
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sitesProtein Science, 1999
- Adaptation of protein surfaces to subcellular location 1 1Edited by F. E. CohenJournal of Molecular Biology, 1998
- Enlarged representative set of protein structuresProtein Science, 1994
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983