Filtering high-throughput protein-protein interaction data using a combination of genomic features
Open Access
- 18 April 2005
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 6 (1) , 100
- https://doi.org/10.1186/1471-2105-6-100
Abstract
Background: Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. Results: In this study, we use a combination of 3 genomic features – structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at http://helix.protein.osaka-u.ac.jp/htp/. Conclusion: A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.Keywords
This publication has 37 references indexed in Scilit:
- Protein interaction networks from yeast to humanCurrent Opinion in Structural Biology, 2004
- A physical and functional map of the human TNF-α/NF-κB signal transduction pathwayNature Cell Biology, 2004
- A Map of the Interactome Network of the Metazoan C. elegansScience, 2004
- A Protein Interaction Map of Drosophila melanogasterScience, 2003
- Comparative assessment of large-scale data sets of protein–protein interactionsNature, 2002
- Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometryNature, 2002
- Functional organization of the yeast proteome by systematic analysis of protein complexesNature, 2002
- A comprehensive two-hybrid analysis to explore the yeast protein interactomeProceedings of the National Academy of Sciences, 2001
- The protein–protein interaction map of Helicobacter pyloriNature, 2001
- A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiaeNature, 2000