Structural similarity to link sequence space: New potential superfamilies and implications for structural genomics
- 1 May 2002
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 11 (5) , 1101-1116
- https://doi.org/10.1110/ps.3950102
Abstract
The current pace of structural biology now means that protein three-dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure-based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three-dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co-location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods.Keywords
This publication has 64 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins11Edited by J. ThorntonJournal of Molecular Biology, 2001
- Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains11Edited by F. CohenJournal of Molecular Biology, 2001
- Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifsJournal of Molecular Biology, 2001
- Comparison of sequence profiles. Strategies for structural predictions using sequence informationProtein Science, 2000
- The Protein Data BankNucleic Acids Research, 2000
- SMART: a web-based tool for the study of genetically mobile domainsNucleic Acids Research, 2000
- Structural Features can be Unconserved in Proteins with Similar Folds: An Analysis of Side-chain to Side-chain Contacts Secondary Structure and AccessibilityJournal of Molecular Biology, 1994
- Three-dimensional structure of the bifunctional enzyme phosphoribosylanthranilate isomerase: Indoleglycerolphosphate synthase from Escherichia coli refined at 2.0 Å resolutionJournal of Molecular Biology, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990