Literature mining and database annotation of protein phosphorylation using a rule-based system
Open Access
- 6 April 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (11) , 2759-2765
- https://doi.org/10.1093/bioinformatics/bti390
Abstract
Motivation: A large volume of experimental data on protein phosphorylation is buried in the fast-growing PubMed literature. While of great value, such information is limited in databases owing to the laborious process of literature-based curation. Computational literature mining holds promise to facilitate database curation. Results: A rule-based system, RLIMS-P (Rule-based LIterature Mining System for Protein Phosphorylation), was used to extract protein phosphorylation information from MEDLINE abstracts. An annotation-tagged literature corpus developed at PIR was used to evaluate the system for finding phosphorylation papers and extracting phosphorylation objects (kinases, substrates and sites) from abstracts. RLIMS-P achieved a precision and recall of 91.4 and 96.4% for paper retrieval, and of 97.9 and 88.0% for extraction of substrates and sites. Coupling the high recall for paper retrieval and high precision for information extraction, RLIMS-P facilitates literature mining and database annotation of protein phosphorylation. Availability: The program is available on request from the authors. The phosphorylation patterns and datasets used in this study are available at http://pir.georgetown.edu/iprolink/ Contact:zh9@georgetown.eduKeywords
This publication has 10 references indexed in Scilit:
- Using name-internal and contextual features to classify biological termsJournal of Biomedical Informatics, 2004
- iProLINK: an integrated protein resource for literature miningComputational Biology and Chemistry, 2004
- Phospho.ELM: A database of experimentally verified phosphorylation sites in eukaryotic proteinsBMC Bioinformatics, 2004
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machineBMC Bioinformatics, 2003
- Protein family classification and functional annotationComputational Biology and Chemistry, 2003
- The Protein Information ResourceNucleic Acids Research, 2003
- Accomplishments and challenges in literature data mining for biologyBioinformatics, 2002
- The origins of protein phosphorylationNature Cell Biology, 2002
- GENIES: a natural-language processing system for the extraction of molecular pathways from journal articlesBioinformatics, 2001