Mining experimental evidence of molecular function claims from the literature
Open Access
- 17 October 2007
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (23) , 3232-3240
- https://doi.org/10.1093/bioinformatics/btm495
Abstract
Motivation: The rate at which gene-related findings appear in the scientific literature makes it difficult if not impossible for biomedical scientists to keep fully informed and up to date. The importance of these findings argues for the development of automated methods that can find, extract and summarize this information. This article reports on methods for determining the molecular function claims that are being made in a scientific article, specifically those that are backed by experimental evidence. Results: The most significant result is that for molecular function claims based on direct assays, our methods achieved recall of 70.7% and precision of 65.7%. Furthermore, our methods correctly identified in the text 44.6% of the specific molecular function claims backed up by direct assays, but with a precision of only 0.92%, a disappointing outcome that led to an examination of the different kinds of errors. These results were based on an analysis of 1823 articles from the literature of Saccharomyces cerevisiae (budding yeast). Availability: The annotation files for S.cerevisiae are available from ftp://genome-ftp.stanford.edu/pub/yeast/data_download/literature_curation/gene_association.sgd.gz. The draft protocol vocabulary is available by request from the first author. Contact:crangle@converspeech.comKeywords
This publication has 28 references indexed in Scilit:
- Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articlesJournal of Biomedical Informatics, 2007
- BRENDA, AMENDA and FRENDA: the enzyme information system in 2007Nucleic Acids Research, 2007
- Building an abbreviation dictionary using a term recognition approachBioinformatics, 2006
- Development of FuGO: An Ontology for Functional Genomics InvestigationsOMICS: A Journal of Integrative Biology, 2006
- An evaluation of GO annotation retrieval for BioCreAtIvE and GOABMC Bioinformatics, 2005
- Learning Statistical Models for Annotating Proteins with Function Information using Biomedical TextBMC Bioinformatics, 2005
- Finding genomic ontology terms in text using evidence contentBMC Bioinformatics, 2005
- Literature mining and database annotation of protein phosphorylation using a rule-based systemBioinformatics, 2005
- Gene annotation from scientific literature using mappings between keyword systemsBioinformatics, 2004
- The FlyBase database of the Drosophila genome projects and community literatureNucleic Acids Research, 2003