An optimization approach to predicting protein structural class from amino acid composition

Abstract
Proteins are generally classified into four structural classes: all‐α proteins, all‐β proteins, α+β proteins, and α/β proteins. In this article, a protein is expressed as a vector of 20‐dimensional space, in which its 20 components are defined by the composition of its 20 amino acids. Based on this, a new method, the so‐called maximum component coefficient method, is proposed for predicting the structural class of a protein according to its amino acid composition. In comparison with the existing methods, the new method yields a higher general accuracy of prediction. Especially for the all‐α proteins, the rate of correct prediction obtained by the new method is much higher than that by any of the existing methods. For instance, for the 19 all‐α proteins investigated previously by P.Y. Chou, the rate of correct prediction by means of his method was 84.2%, but the correct rate when predicted with the new method would be 100%! Furthermore, the new method is characterized by an explicable physical picture. This is reflected by the process in which the vector representing a protein to be predicted is decomposed into four component vectors, each of which corresponds to one of the norms of the four protein structural classes.
Funding Information
  • Tianjin University