Assessing a novel approach for predicting local 3D protein structures from sequence
- 29 December 2005
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 62 (4) , 865-880
- https://doi.org/10.1002/prot.20815
Abstract
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 Å Cα root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype <2.5 Å from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly. Proteins 2006.Keywords
This publication has 54 references indexed in Scilit:
- A Hidden Markov Model Derived Structural Alphabet for ProteinsJournal of Molecular Biology, 2004
- Assessment of novel fold targets in CASP4: Predictions of three-dimensional structures, secondary structures, and interresidue contactsProteins-Structure Function and Bioinformatics, 2001
- Assessment of the CASP4 fold recognition categoryProteins-Structure Function and Bioinformatics, 2001
- The Protein Data BankNucleic Acids Research, 2000
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification 1 1Edited by J. M. ThorntonJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functionsJournal of Molecular Biology, 1997
- VMD: Visual molecular dynamicsJournal of Molecular Graphics, 1996
- Self-organized formation of topologically correct feature mapsBiological Cybernetics, 1982