Semantically linking and browsing PubMed abstracts with gene ontology
Open Access
- 20 March 2008
- journal article
- research article
- Published by Springer Nature in BMC Genomics
- Vol. 9 (S1) , S10
- https://doi.org/10.1186/1471-2164-9-s1-s10
Abstract
Background The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology. Results The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics. Conclusions The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.Keywords
This publication has 14 references indexed in Scilit:
- Significance of Gene Ranking for Classification of Microarray SamplesIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2006
- AliBaba: PubMed as a graphBioinformatics, 2006
- An algorithm for suffix strippingProgram: electronic library and information systems, 2006
- GoPubMed: exploring PubMed with the Gene OntologyNucleic Acids Research, 2005
- PubFinder: a tool for improving retrieval rate of relevant PubMed abstractsNucleic Acids Research, 2005
- Content-rich biological network constructed by mining PubMed abstractsBMC Bioinformatics, 2004
- PubMed: bridging the information gap.2001
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- An introduction to latent semantic analysisDiscourse Processes, 1998
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990