The importance of larger data sets for protein secondary structure prediction with neural networks

Open Access

1 April 1996

journal article
research article
Published by Wiley in Protein Science

Vol. 5 (4) , 768-774
https://doi.org/10.1002/pro.5560050422

Abstract

A neural network algorithm is applied to secondary structure and structural class prediction for a database of 318 nonhomologous protein chains. Significant improvement in accuracy is obtained as compared with performance on smaller databases. A systematic study of the effects of network topology shows that, for the larger database, better results are obtained with more units in the hidden layer. In a 32-fold cross validated test, secondary structure prediction accuracy is 67.0%, relative to 62.6% obtained previously, without any evolutionary information on the sequence. Introduction of sequence profiles increases this value to 72.9%, suggesting that the two types of information are essentially independent. Tertiary structural class is predicted with 80.2% accuracy, relative to 73.9% obtained previously. The use of a larger database is facilitated by the introduction of a scaled conjugate gradient algorithm for optimizing the neural network. This algorithm is about 10–20 times as fast as the standard steepest descent algorithm.

Keywords

Funding Information

National Science Foundation

This publication has 24 references indexed in Scilit:

Neural networks for secondary structure and structural class predictions
Protein Science, 1995
Theory and Applications of Neural Computing in Chemical Science
Annual Review of Physical Chemistry, 1994
Protein folding dynamics: The diffusion‐collision model and experimental data
Protein Science, 1994
Redefining the goals of protein secondary structure prediction
Journal of Molecular Biology, 1994
Prediction of Protein Secondary Structure at Better than 70% Accuracy
Journal of Molecular Biology, 1993
Hybrid system for protein secondary structure prediction
Journal of Molecular Biology, 1992
Improvements in protein secondary structure prediction by an enhanced neural network
Journal of Molecular Biology, 1990
Predicting the secondary structure of globular proteins using neural network models
Journal of Molecular Biology, 1988
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983
Structural patterns in globular proteins
Nature, 1976