Predicting fold novelty based on ProtoNet hierarchical classification
Open Access
- 11 November 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (7) , 1020-1027
- https://doi.org/10.1093/bioinformatics/bti135
Abstract
Motivation: Structural genomics projects aim to solve a large number of protein structures with the ultimate objective of representing the entire protein space. The computational challenge is to identify and prioritize a small set of proteins with new, currently unknown, superfamilies or folds. Results: We develop a method that assigns each protein a likelihood of it belonging to a new, yet undetermined, structural superfamily. The method relies on a variant of ProtoNet, an automatic hierarchical classification scheme of all protein sequences from SwissProt. Our results show that proteins that are remote from solved structures in the ProtoNet hierarchy are more likely to belong to new superfamilies. The results are validated against SCOP releases from recent years that account for about half of the solved structures known to date. We show that our new method and the representation of ProtoNet are superior in detecting new targets, compared to our previous method using ProtoMap classification. Furthermore, our method outperforms PSI-BLAST search in detecting potential new superfamilies. Availability: An interactive tool implementing this method, named ProTarget, is available at http://www.protarget.cs.huji.ac.il. It can be used interactively to retrieve a list of candidate proteins for Structural genomics projects. Supplementary material is available at http://www.protarget.cs.huji.ac.il/supplement Contact:michall@cc.huji.ac.ilKeywords
This publication has 32 references indexed in Scilit:
- A robust method to detect structural and functional remote homologuesProteins-Structure Function and Bioinformatics, 2004
- A practical and robust sequence search strategy for structural genomics target selectionBioinformatics, 2004
- Structural genomics: Computational methods for structure analysisProtein Science, 2003
- Domains, motifs and clusters in the protein universeCurrent Opinion in Chemical Biology, 2003
- Structural genomics: A pipeline for providing structures for the biologistProtein Science, 2002
- Expectations from structural genomicsProtein Science, 2000
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- A comparison of sequence and structure protein domain families as a basis for structural genomics.Bioinformatics, 1999
- 100,000 protein structures for the biologistNature Structural & Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997