Combining evolutionary information and neural networks to predict protein secondary structure

1 May 1994

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 19 (1) , 55-72
https://doi.org/10.1002/prot.340190108

Abstract

Using evolutionary information contained in multiple sequence alignments as input to neural networks, secondary structure can be predicted at significantly increased accuracy. Here, we extend our previous three-level system of neural networks by using additional input information derived from multiple alignments. Using a position-specific conservation weight as part of the input increases performance. Using the number of insertions and deletions reduces the tendency for overprediction and increases overall accuracy. Addition of the global amino acid content yields a further improvement, mainly in predicting structural class. The final network system has a sustained overall accuracy of 71.6% in a multiple cross-validation test on 126 unique protein chains. A test on a new set of 124 recently solved protein structures that have no significant sequence similarity to the learning set confirms the high level of accuracy. The average cross-validated accuracy for all 250 sequence-unique chains is above 72%. Using various data sets, the method is compared to alternative prediction methods, some of which also use multiple alignments: the performance advantage of the network system is at least 6 percentage points in three-state accuracy. In addition, the network estimates secondary structure content from multiple sequence alignments about as well as circular dichroism spectroscopy on a single protein and classifies 75% of the 250 proteins correctly into one of four protein structural classes. Of particular practical importance is the definition of a position-specific reliability index. For 40% of all residues the method has a sustained three-state accuracy of 88%, as high as the overall average for homology modelling. A further strength of the method is greatly increased accuracy in predicting the placement of secondary structure segments.

Keywords

This publication has 114 references indexed in Scilit:

Prediction of Protein Secondary Structure at Better than 70% Accuracy
Journal of Molecular Biology, 1993
One thousand families for the molecular biologist
Nature, 1992
Selection of representative protein data sets
Protein Science, 1992
Improvements in protein secondary structure prediction by an enhanced neural network
Journal of Molecular Biology, 1990
Predicting the secondary structure of globular proteins using neural network models
Journal of Molecular Biology, 1988
Prediction of protein structural class by discriminant analysis
Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 1986
Solvation energy in protein folding and binding
Nature, 1986
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983
Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins
Journal of Molecular Biology, 1978
Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure
Journal of Molecular Biology, 1974