ALICE: An Algorithm to Extract Abbreviations from MEDLINE
Open Access
- 19 May 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 12 (5) , 576-586
- https://doi.org/10.1197/jamia.m1757
Abstract
Objective: To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly. Methods: ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules. Results: It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database. Conclusion: ALICE extracted abbreviations and their expansions from the literature efficiently. The subtly compiled heuristics enabled it to extract abbreviations with high recall without significantly reducing precision. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate. This system is freely available at http://uvdb3.hgc.jp/ALICE/ALICE_index.html.Keywords
This publication has 7 references indexed in Scilit:
- MINING TERMINOLOGICAL KNOWLEDGE IN LARGE BIOMEDICAL CORPORAPacific Symposium on Biocomputing, 2002
- A SIMPLE ALGORITHM FOR IDENTIFYING ABBREVIATION DEFINITIONS IN BIOMEDICAL TEXTPacific Symposium on Biocomputing, 2002
- Creating an Online Dictionary of Abbreviations from MEDLINEJournal of the American Medical Informatics Association, 2002
- A study of abbreviations in MEDLINE abstracts.2002
- Heuristics for Identification of Acronym-Definition Patterns within Text: Towards an Automated Construction of Comprehensive Acronym-Definition DictionariesMethods of Information in Medicine, 2002
- Automatic extraction of acronym-meaning pairs from MEDLINE databases.2001
- PNAD-CSS: a workbench for constructing a protein name abbreviation dictionaryBioinformatics, 2000