Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I2D
Open Access
- 22 October 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (1) , 111-119
- https://doi.org/10.1093/bioinformatics/btp602
Abstract
Motivation: Identification and characterization of protein–protein interactions (PPIs) is one of the key aims in biological research. While previous research in text mining has made substantial progress in automatic PPI detection from literature, the need to improve the precision and recall of the process remains. More accurate PPI detection will also improve the ability to extract experimental data related to PPIs and provide multiple evidence for each interaction. Results: We developed an interaction detection method and explored the usefulness of various features in automatically identifying PPIs in text. The results show that our approach outperforms other systems using the AImed dataset. In the tests where our system achieves better precision with reduced recall, we discuss possible approaches for improvement. In addition to test datasets, we evaluated the performance on interactions from five human-curated databases—BIND, DIP, HPRD, IntAct and MINT—where our system consistently identified evidence for ∼60% of interactions when both proteins appear in at least one sentence in the PubMed We then applied the system to extract articles from PubMed to annotate known, high-throughput and interologous interactions in I2D. Availability: The data and software are available at: http://www.cs.utoronto.ca/∼juris/data/BI09/. Contact:yniu@uhnres.utoronto.ca; juris@ai.utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 40 references indexed in Scilit:
- NAViGaTOR: Network Analysis, Visualization and Graphing TorontoBioinformatics, 2009
- Unequal evolutionary conservation of human protein interactions in interologous networksGenome Biology, 2007
- RelEx—Relation extraction using dependency parse treesBioinformatics, 2006
- A simple approach for protein name identification: prospects and limitsBMC Bioinformatics, 2005
- High-Throughput Mapping of a Dynamic Signaling Network in Mammalian CellsScience, 2005
- Comparative experiments on learning information extractors for proteins and their interactionsArtificial Intelligence in Medicine, 2005
- Online Predicted Human Interaction DatabaseBioinformatics, 2005
- PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machineBMC Bioinformatics, 2003
- Functional organization of the yeast proteome by systematic analysis of protein complexesNature, 2002
- BIND--The Biomolecular Interaction Network DatabaseNucleic Acids Research, 2001