Abstract
Protein interaction networks are an important part of the post-genomic effort to integrate a part-list view of the cell into system-level understanding. Using a set of 11 yeast genomes we show that combining comparative genomics and secondary structure information greatly increases consensus-based prediction of SH3 targets. Benchmarking of our method against positive and negative standards gave 83% accuracy with 26% coverage. The concept of an optimal divergence time for effective comparative genomics studies was analyzed, demonstrating that genomes of species that diverged very recently from Saccharomyces cerevisiae (S. mikatae, S. bayanus, and S. paradoxus), or a long time ago (Neurospora crassa and Schizosaccharomyces pombe), contain less information for accurate prediction of SH3 targets than species within the optimal divergence time proposed. We also show here that intrinsically disordered SH3 domain targets are more probable sites of interaction than equivalent sites within ordered regions. Our findings highlight several novel S. cerevisiae SH3 protein interactions, the value of selection of optimal divergence times in comparative genomics studies, and the importance of intrinsic disorder for protein interactions. Based on our results we propose novel roles for the S. cerevisiae proteins Abp1p in endocytosis and Hse1p in endosome protein sorting. How can we tackle the complexity of a living cell? It is commonly said that living organisms are complex and display “emergent” properties. Emergence is perceived in this context as behaviors that appear at the system level but are not observable at the level of the system's components. In the cell this would be equivalent to saying that the cellular complexity could be explained if we could understand the interplay between the cellular components: that is, not just describe the “parts” that make up a cell but understand how they interact with each other to perform the necessary tasks. A big step on the road to understanding cellular complexity will be a complete list of all relevant interactions between the cellular components. Although a lot of progress as been made in this direction, we are often dependent on experimental methods that are costly and time consuming. It's a big challenge for computational biology to process the current available knowledge and to propose new ways of predicting the interactions between cellular components. Here the researchers studied protein interactions that are mediated by small linear peptide motifs,specifically interactions between a protein's SH3 domain and its targets, usually small peptide stretches containing a PXXP motif (where P is proline and X is any amino acid). The results showed that the putative target motifs that are conserved in ortholog proteins and are within regions that do not have a defined secondary structure are more likely to be relevant binding sites. Besides proposing a way to combine secondary structure information with comparative genomics to predict protein–protein interactions, the researchers highlight a possible role of intrinsically disordered proteins in SH3 protein interactions. The results also show that when looking for conservation of these motifs, it is important to carefully select the species used in the study: comparisons between species that have diverged to a certain extent—not too little and not too much—are the most informative.