Prediction of the secondary structure content of globular proteins based on structural classes

Abstract
The prediction of the secondary structure content (α-helix andΒ-strand content) of a globular protein may play an important complementary role in the prediction of the protein's structure. We propose a new prediction algorithm based on Chou's database [Chou (1995),Proteins Struct. Fund Genet. 21, 319]. The new algorithm is an improved multiple linear regression method, taking the nonlinear and coupling terms of the frequencies of different amino acids into account. The prediction is also based on the structural classes of proteins. A resubstitution examination for the algorithm shows that the average errors are 0.040 and 0.033 for the prediction ofα-helix content andΒ-strand content, respectively. The examination of cross-validation, the jackknife analysis, shows that the average errors are 0.051 and 0.044 for the prediction ofα-helix content andΒ-strand content, respectively. Both examinations indicate the self-consistency and the extrapolative effectiveness of the new algorithm. Compared with the other methods available currently, our method has the merits of simplicity and convenience for use, as well as a high prediction accuracy. By incorporating the prediction of the structural classes, the only input of our method is the amino acid composition of the protein to be predicted.