A critical cross-validation of high throughput structural binding prediction methods for pMHC

Abstract
T-cells recognize antigens via their T-cell receptors. The major histocompatibility complex (MHC) binds antigens in a specific way, transports them to the surface and presents the peptides to the TCR. Many in silico approaches have been developed to predict the binding characteristics of potential T-cell epitopes (peptides), with most of them being based solely on the amino acid sequence. We present a structural approach which provides insights into the spatial binding geometry. We combine different tools for side chain substitution (threading), energy minimization, as well as scoring methods for protein/peptide interfaces. The focus of this study is on high data throughput in combination with accurate results. These methods are not meant to predict the accurate binding free energy but to give a certain direction for the classification of peptides into peptides that are potential binders and peptides that definitely do not bind to a given MHC structure. In total we performed approximately 83,000 binding affinity prediction runs to evaluate interactions between peptides and MHCs, using different combinations of tools. Depending on the tools used, the prediction quality ranged from almost random to around 75% of accuracy for correctly predicting a peptide to be either a binder or a non-binder. The prediction quality strongly depends on all three evaluation steps, namely, the threading of the peptide, energy minimization and scoring.