Resolving abbreviations to their senses in Medline
Open Access
- 21 July 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (18) , 3658-3664
- https://doi.org/10.1093/bioinformatics/bti586
Abstract
Motivation: Biological literature contains many abbreviations with one particular sense in each document. However, most abbreviations do not have a unique sense across the literature. Furthermore, many documents do not contain the long forms of the abbreviations. Resolving an abbreviation in a document consists of retrieving its sense in use. Abbreviation resolution improves accuracy of document retrieval engines and of information extraction systems. Results: We combine an automatic analysis of Medline abstracts and linguistic methods to build a dictionary of abbreviation/sense pairs. The dictionary is used for the resolution of abbreviations occurring with their long forms. Ambiguous global abbreviations are resolved using support vector machines that have been trained on the context of each instance of the abbreviation/sense pairs, previously extracted for the dictionary set-up. The system disambiguates abbreviations with a precision of 98.9% for a recall of 98.2% (98.5% accuracy). This performance is superior in comparison with previously reported research work. Availability: The abbreviation resolution module is available at http://www.ebi.ac.uk/Rebholz/software.html Contact:gaudan@ebi.ac.ukKeywords
This publication has 16 references indexed in Scilit:
- Exploring the boundaries: gene and protein identification in biomedical textBMC Bioinformatics, 2005
- Biomedical term mapping databasesNucleic Acids Research, 2004
- Gene name ambiguity of eukaryotic nomenclaturesBioinformatics, 2004
- SaRAD: a Simple and Robust Abbreviation DictionaryBioinformatics, 2004
- Life cycles of successful genesTrends in Genetics, 2003
- Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLSJournal of the American Medical Informatics Association, 2002
- Mapping Abbreviations to Full Forms in Biomedical ArticlesJournal of the American Medical Informatics Association, 2002
- Recognizing acronyms and their definitionsInternational Journal on Document Analysis and Recognition (IJDAR), 1999
- The C-value/NC-value domain-independent method for multi-word term extractionJournal of Natural Language Processing, 1999
- Text categorization with Support Vector Machines: Learning with many relevant featuresPublished by Springer Nature ,1998