How good is prediction of protein structural class by the component-coupled method?
- 1 February 2000
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 38 (2) , 165-175
- https://doi.org/10.1002/(sici)1097-0134(20000201)38:2<165::aid-prot5>3.0.co;2-v
Abstract
Proteins of known structures are usually classified into four structural classes: all-α, all-β, α+β, and α/β type of proteins. A number of methods to predicting the structural class of a protein based on its amino acid composition have been developed during the past few years. Recently, a component-coupled method was developed for predicting protein structural class according to amino acid composition. This method is based on the least Mahalanobis distance principle, and yields much better predicted results in comparison with the previous methods. However, the success rates reported for structural class prediction by different investigators are contradictory. The highest reported accuracies by this method are near 100%, but the lowest one is only about 60%. The goal of this study is to resolve this paradox and to determine the possible upper limit of prediction rate for structural classes. In this paper, based on the normality assumption and the Bayes decision rule for minimum error, a new method is proposed for predicting the structural class of a protein according to its amino acid composition. The detailed theoretical analysis indicates that if the four protein folding classes are governed by the normal distributions, the present method will yield the optimum predictive result in a statistical sense. A non-redundant data set of 1,189 protein domains is used to evaluate the performance of the new method. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein. The apparent relatively high accuracy level (more than 90%) attained in the previous studies was due to the preselection of test sets, which may not be adequately representative of all unrelated proteins. Proteins 2000;38:165–175.Keywords
This publication has 29 references indexed in Scilit:
- Analysis of Domain Structural Class Using an Automated Class Assignment ProtocolJournal of Molecular Biology, 1996
- Accurate prediction of protein secondary structural class with fuzzy structural vectorsProtein Engineering, Design and Selection, 1995
- Prediction of Protein Structural ClassesCritical Reviews in Biochemistry and Molecular Biology, 1995
- Comparison of conformational characteristics in structurally similar protein pairsProtein Science, 1993
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- A new approach to predicting protein folding typesProtein Journal, 1993
- Improvements in protein secondary structure prediction by an enhanced neural networkJournal of Molecular Biology, 1990
- Prediction of protein structural class by discriminant analysisBiochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 1986
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- Structural patterns in globular proteinsNature, 1976