Abbreviation definition identification based on automatic precision estimates
Open Access
- 25 September 2008
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 9 (1) , 402
- https://doi.org/10.1186/1471-2105-9-402
Abstract
The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation.Keywords
This publication has 15 references indexed in Scilit:
- ADAM: another database of abbreviations in MEDLINEBioinformatics, 2006
- MedPost: a part-of-speech tagger for bioMedical textBioinformatics, 2004
- MINING TERMINOLOGICAL KNOWLEDGE IN LARGE BIOMEDICAL CORPORAPacific Symposium on Biocomputing, 2002
- A SIMPLE ALGORITHM FOR IDENTIFYING ABBREVIATION DEFINITIONS IN BIOMEDICAL TEXTPacific Symposium on Biocomputing, 2002
- Creating an Online Dictionary of Abbreviations from MEDLINEJournal of the American Medical Informatics Association, 2002
- Mapping Abbreviations to Full Forms in Biomedical ArticlesJournal of the American Medical Informatics Association, 2002
- Automatic extraction of acronym-meaning pairs from MEDLINE databases.2001
- Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.2001
- A broad-coverage natural language processing system.2000
- Acronyms of clinical trials in cardiology—1998American Heart Journal, 1999