Prediction of protein structural class by amino acid and polypeptide composition
- 28 August 2002
- journal article
- research article
- Published by Wiley in European Journal of Biochemistry
- Vol. 269 (17) , 4219-4225
- https://doi.org/10.1046/j.1432-1033.2002.03115.x
Abstract
A new approach of predicting structural classes of protein domain sequences is presented in this paper. Besides the amino acid composition, the composition of several dipeptides, tripeptides, tetrapeptides, pentapeptides and hexapeptides are taken into account based on the stepwise discriminant analysis. The result of jackknife test shows that this new approach can lead to higher predictive sensitivity and specificity for reduced sequence similarity datasets. Considering the dataset PDB40‐B constructed by Brenner and colleagues, 75.2% protein domain sequences are correctly assigned in the jackknife test for the four structural classes: all‐α, all‐β, α/β and α + β, which is improved by 19.4% in jackknife test and 25.5% in resubstitution test, in contrast with the component‐coupled algorithm using amino acid composition alone (AAC approach) for the same dataset. In the cross‐validation test with dataset PDB40‐J constructed by Park and colleagues, more than 80% predictive accuracy is obtained. Furthermore, for the dataset constructed by Chou and Maggiona, the accuracy of 100% and 99.7% can be easily achieved, respectively, in the resubstitution test and in the jackknife test merely taking the composition of dipeptides into account. Therefore, this new method provides an effective tool to extract valuable information from protein sequences, which can be used for the systematic analysis of small or medium size protein sequences. The computer programs used in this paper are available on request.Keywords
This publication has 34 references indexed in Scilit:
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresPublished by Elsevier ,2006
- Protein domain decomposition using a graph-theoretic approachBioinformatics, 2000
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Accurate prediction of protein secondary structural class with fuzzy structural vectorsProtein Engineering, Design and Selection, 1995
- Prediction of Protein Structural ClassesCritical Reviews in Biochemistry and Molecular Biology, 1995
- Improvements in protein secondary structure prediction by an enhanced neural networkJournal of Molecular Biology, 1990
- Prediction of protein structural class by discriminant analysisBiochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 1986
- Prediction of protein structural class from the amino acid sequenceBiopolymers, 1986
- Structural patterns in globular proteinsNature, 1976
- Principles that Govern the Folding of Protein ChainsScience, 1973