Genome-Wide Prediction of SH2 Domain Targets Using Structural Information and the FoldX Algorithm

Abstract
Current experiments likely cover only a fraction of all protein-protein interactions. Here, we developed a method to predict SH2-mediated protein-protein interactions using the structure of SH2-phosphopeptide complexes and the FoldX algorithm. We show that our approach performs similarly to experimentally derived consensus sequences and substitution matrices at predicting known in vitro and in vivo targets of SH2 domains. We use our method to provide a set of high-confidence interactions for human SH2 domains with known structure filtered on secondary structure and phosphorylation state. We validated the predictions using literature-derived SH2 interactions and a probabilistic score obtained from a naive Bayes integration of information on coexpression, conservation of the interaction in other species, shared interaction partners, and functions. We show how our predictions lead to a new hypothesis for the role of SH2 domains in signaling. Understanding the functional role of every protein in the cell is a long-standing goal of cellular biology. An important step in this direction is to discover how and when proteins interact inside the cell to accomplish their tasks. Many of the cellular functions depend on reversible protein modifications like phosphorylation. To sense these modifications, cells have protein domains capable of binding phosphorylated proteins such as the SH2 domain. In this work, we show that it is possible to use the three-dimensional structure of protein domains to predict its binding preferences. Using a computational tool called FoldX, we have predicted the binding specificity of several human SH2 domains. These predictions, based on the computational analysis of the 3-D structure, were shown to be of similar accuracy as those obtained from experimental binding assays. We show here that it is also possible to understand how a mutation changes the binding preference of protein binding domains, opening the way for better understanding of some disease causing mutations. The combination of this novel computational approach with other sources of information allowed us to provide a set of high-confidence novel interactions for the proteins here studied.