On Evaluating MHC-II Binding Peptide Prediction Methods
Open Access
- 24 September 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 3 (9) , e3268
- https://doi.org/10.1371/journal.pone.0003268
Abstract
Choice of one method over another for MHC-II binding peptide prediction is typically based on published reports of their estimated performance on standard benchmark datasets. We show that several standard benchmark datasets of unique peptides used in such studies contain a substantial number of peptides that share a high degree of sequence identity with one or more other peptide sequences in the same dataset. Thus, in a standard cross-validation setup, the test set and the training set are likely to contain sequences that share a high degree of sequence identity with each other, leading to overly optimistic estimates of performance. Hence, to more rigorously assess the relative performance of different prediction methods, we explore the use of similarity-reduced datasets. We introduce three similarity-reduced MHC-II benchmark datasets derived from MHCPEP, MHCBN, and IEDB databases. The results of our comparison of the performance of three MHC-II binding peptide prediction methods estimated using datasets of unique peptides with that obtained using their similarity-reduced counterparts shows that the former can be rather optimistic relative to the performance of the same methods on similarity-reduced counterparts of the same datasets. Furthermore, our results demonstrate that conclusions regarding the superiority of one method over another drawn on the basis of performance estimates obtained using commonly used datasets of unique peptides are often contradicted by the observed performance of the methods on the similarity-reduced versions of the same datasets. These results underscore the importance of using similarity-reduced datasets in rigorously comparing the performance of alternative MHC-II peptide prediction methods.Keywords
This publication has 41 references indexed in Scilit:
- A Systematic Assessment of MHC Class II Peptide Binding Predictions and Evaluation of a Consensus ApproachPLoS Computational Biology, 2008
- In Silico Tools for Predicting Peptides Binding to HLA-Class II Molecules: More Confusion than ConclusionJournal of Proteome Research, 2007
- Predicting peptides binding to MHC class II molecules using multi-objective evolutionary algorithmsBMC Bioinformatics, 2007
- Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment methodBMC Bioinformatics, 2007
- Prediction of supertype-specific HLA class I binding peptides using support vector machinesJournal of Immunological Methods, 2007
- SVMHC: a server for prediction of MHC-binding peptidesNucleic Acids Research, 2006
- PepDist: A New Framework for Protein-Peptide Binding Prediction based on Learning Peptide Distance FunctionsBMC Bioinformatics, 2006
- PROTEIN STRUCTURE AND FOLD PREDICTION USING TREE-AUGMENTED NAÏVE BAYESIAN CLASSIFIERJournal of Bioinformatics and Computational Biology, 2005
- The Immune Epitope Database and Analysis Resource: From Vision to BlueprintPLoS Biology, 2005
- Reliable prediction of T‐cell epitopes using neural networks with novel sequence representationsProtein Science, 2003