A hierarchical approach to all‐atom protein loop prediction
Top Cited Papers
- 5 March 2004
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 55 (2) , 351-367
- https://doi.org/10.1002/prot.10613
Abstract
The application of all‐atom force fields (and explicit or implicit solvent models) to protein homology‐modeling tasks such as side‐chain and loop prediction remains challenging both because of the expense of the individual energy calculations and because of the difficulty of sampling the rugged all‐atom energy surface. Here we address this challenge for the problem of loop prediction through the development of numerous new algorithms, with an emphasis on multiscale and hierarchical techniques. As a first step in evaluating the performance of our loop prediction algorithm, we have applied it to the problem of reconstructing loops in native structures; we also explicitly include crystal packing to provide a fair comparison with crystal structures. In brief, large numbers of loops are generated by using a dihedral angle‐based buildup procedure followed by iterative cycles of clustering, side‐chain optimization, and complete energy minimization of selected loop structures. We evaluate this method by using the largest test set yet used for validation of a loop prediction method, with a total of 833 loops ranging from 4 to 12 residues in length. Average/median backbone root‐mean‐square deviations (RMSDs) to the native structures (superimposing the body of the protein, not the loop itself) are 0.42/0.24 Å for 5 residue loops, 1.00/0.44 Å for 8 residue loops, and 2.47/1.83 Å for 11 residue loops. Median RMSDs are substantially lower than the averages because of a small number of outliers; the causes of these failures are examined in some detail, and many can be attributed to errors in assignment of protonation states of titratable residues, omission of ligands from the simulation, and, in a few cases, probable errors in the experimentally determined structures. When these obvious problems in the data sets are filtered out, average RMSDs to the native structures improve to 0.43 Å for 5 residue loops, 0.84 Å for 8 residue loops, and 1.63 Å for 11 residue loops. In the vast majority of cases, the method locates energy minima that are lower than or equal to that of the minimized native loop, thus indicating that sampling rarely limits prediction accuracy. The overall results are, to our knowledge, the best reported to date, and we attribute this success to the combination of an accurate all‐atom energy function, efficient methods for loop buildup and side‐chain optimization, and, especially for the longer loops, the hierarchical refinement protocol. Proteins 2004;55:000–000.Keywords
This publication has 32 references indexed in Scilit:
- On the Role of the Crystal Environment in Determining Protein Side-chain ConformationsJournal of Molecular Biology, 2002
- Evaluating conformational free energies: The colony energy and its application to the problem of loop predictionProceedings of the National Academy of Sciences, 2002
- Ab initio modeling of small, medium, and large loops in proteinsBiopolymers, 2001
- Modeling of loops in protein structuresProtein Science, 2000
- New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classificationJournal of Molecular Biology, 1999
- PDB-based protein loop prediction: parameters for selection and methods for optimizationJournal of Molecular Biology, 1997
- Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modellingJournal of Molecular Biology, 1997
- An automated classification of the structure of protein loopsJournal of Molecular Biology, 1997
- Protein structure prediction using a combination of sequence homology and global energy minimization I. Global energy minimization of surface loopsJournal of Computational Chemistry, 1990
- An algorithm which predicts the conformation of short lengths of chain in proteinsJournal of Molecular Graphics, 1986