CASP2 knowledge-based approach to distant homology recognition and fold prediction in CASP4
- 1 January 2001
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 45 (S5) , 76-85
- https://doi.org/10.1002/prot.10037
Abstract
In 1996, in CASP2, we presented a semimanual approach to the prediction of protein structure that was aimed at the recognition of probable distant homology, where it existed, between a given target protein and a protein of known structure (Murzin and Bateman, 1 ). Central to our method was the knowledge of all known structural and probable evolutionary relationships among proteins of known structure classified in the SCOP database (Murzin et al., J Mol Biol 1995;247:536–540). It was demonstrated that a knowledge-based approach could compete successfully with the best computational methods of the time in the correct recognition of the target protein fold. Four years later, in CASP4, we have applied essentially the same knowledge-based approach to distant homology recognition, concentrating our effort on the improvement of the completeness and alignment accuracy of our models. The manifold increase of available sequence and structure data was to our advantage, as well as was the experience and expertise obtained through the classification of these data. In particular, we were able to model most of our predictions from several distantly related structures rather than from a single parent structure, and we could use more superfamily characteristic features for the refinement of our alignments. Our predictions for each of the attempted distant homology recognition targets ranked among the few top predictions for each of these targets, with the predictions for the hypothetical protein HI0065 (T0104) and the C-terminal domain of the ABC transporter MalK (T0121C) being particularly successful. We also have attempted the prediction of protein folds of some of the targets tentatively assigned to new superfamilies. The average quality of our fold predictions was far less than the quality of our distant homology recognition models, but for the two targets, chorismate lyase (T0086) and Appr>p cyclic phosphodiesterase (T0094), our predictions achieved the top ranking. Proteins 2001;Suppl 5:76–85.Keywords
This publication has 26 references indexed in Scilit:
- Crystal structure of the M-fragment of alpha-catenin: implications for modulation of cell adhesionThe EMBO Journal, 2001
- A new mutation inspo0Awith intragenic suppressors in the effector domainFEMS Microbiology Letters, 2000
- The Pfam Protein Families DatabaseNucleic Acids Research, 2000
- Structure classification-based assessment of CASP3 predictions for the fold recognition targetsProteins-Structure Function and Bioinformatics, 1999
- The proofreading domain of Escherichia coli DNA polymerase I and other DNA and/or RNA exonuclease domainsNucleic Acids Research, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Distant homology recognition using structural classification of proteinsProteins-Structure Function and Bioinformatics, 1997
- New folds for all-β proteinsStructure, 1993
- A new family of bacterial regulatory proteinsFEMS Microbiology Letters, 1991
- Improved methods for building protein models in electron density maps and the location of errors in these modelsActa Crystallographica Section A Foundations of Crystallography, 1991