Evaluation and improvement of multiple sequence methods for protein secondary structure prediction
Open Access
- 1 March 1999
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 34 (4) , 508-519
- https://doi.org/10.1002/(sici)1097-0134(19990301)34:4<508::aid-prot10>3.0.co;2-4
Abstract
A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396‐protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8‐ to 3‐state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross‐validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/. Proteins 1999;34:508–519.Keywords
This publication has 79 references indexed in Scilit:
- Prediction of Protein Secondary Structure by Combining Nearest-neighbor Algorithms and Multiple Sequence AlignmentsJournal of Molecular Biology, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- A new approach to protein fold recognitionNature, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Phosphocholine binding immunoglobulin Fab McPC603Journal of Molecular Biology, 1986
- Refinement of a molecular model for lamprey hemoglobin from Petromyzon marinusJournal of Molecular Biology, 1985
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteinsJournal of Molecular Biology, 1978
- Algorithms for prediction of α-helical and β-structural regions in globular proteinsJournal of Molecular Biology, 1974