Identification of related proteins with weak sequence identity using secondary structure information
- 1 April 2001
- journal article
- Published by Wiley in Protein Science
- Vol. 10 (4) , 788-797
- https://doi.org/10.1110/ps.30001
Abstract
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.Keywords
This publication has 28 references indexed in Scilit:
- Enhanced genome annotation using structural profiles in the program 3D-PSSM 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Intermediate sequences increase the detection of homology between sequencesJournal of Molecular Biology, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Enlarged representative set of protein structuresProtein Science, 1994
- Redefining the goals of protein secondary structure predictionJournal of Molecular Biology, 1994
- Comparative Protein Modelling by Satisfaction of Spatial RestraintsJournal of Molecular Biology, 1993
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983