Identification and application of the concepts important for accurate and reliable protein secondary structure prediction

Open Access

1 November 1996

journal article
research article
Published by Wiley in Protein Science

Vol. 5 (11) , 2298-2310
https://doi.org/10.1002/pro.5560051116

Abstract

A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three‐state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as; residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto‐correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto‐correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of >80%. Existing high‐accuracy prediction methods are “black‐box” predictors based on complex nonlinear statistics (e.g., neural networks in P.HD: Rost & Sander, 1993a). For medium‐ to short‐length chains (≥90 residues and P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three‐state accuracy of 72.4%, the highest accuracy reported for any prediction method.

Keywords

This publication has 47 references indexed in Scilit:

Prediction of Protein Secondary Structure by Combining Nearest-neighbor Algorithms and Multiple Sequence Alignments
Journal of Molecular Biology, 1995
Redefining the goals of protein secondary structure prediction
Journal of Molecular Biology, 1994
Protein Secondary Structure Prediction Using Nearest-neighbor Methods
Journal of Molecular Biology, 1993
Prediction of Protein Secondary Structure at Better than 70% Accuracy
Journal of Molecular Biology, 1993
Amino acid preferences of small proteins
Journal of Molecular Biology, 1992
Improvements in protein secondary structure prediction by an enhanced neural network
Journal of Molecular Biology, 1990
Predicting the secondary structure of globular proteins using neural network models
Journal of Molecular Biology, 1988
Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins
Journal of Molecular Biology, 1978
Algorithms for prediction of α-helical and β-structural regions in globular proteins
Journal of Molecular Biology, 1974
Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure
Journal of Molecular Biology, 1974