Building an abbreviation dictionary using a term recognition approach
Open Access
- 18 October 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (24) , 3089-3095
- https://doi.org/10.1093/bioinformatics/btl534
Abstract
Motivation: Acronyms result from a highly productive type of term variation and trigger the need for an acronym dictionary to establish associations between acronyms and their expanded forms. Results: We propose a novel method for recognizing acronym definitions in a text collection. Assuming a word sequence co-occurring frequently with a parenthetical expression to be a potential expanded form, our method identifies acronym definitions in a similar manner to the statistical term recognition task. Applied to the whole MEDLINE (7 811 582 abstracts), the implemented system extracted 886 755 acronym candidates and recognized 300 954 expanded forms in reasonable time. Our method outperformed base-line systems, achieving 99% precision and 82–95% recall on our evaluation corpus that roughly emulates the whole MEDLINE. Availability and Supplementary information: The implementations and supplementary information are available at our web site: Contact:okazaki@mi.ci.i.u-tokyo.ac.jpKeywords
This publication has 10 references indexed in Scilit:
- Resolving abbreviations to their senses in MedlineBioinformatics, 2005
- ALICE: An Algorithm to Extract Abbreviations from MEDLINEJournal of the American Medical Informatics Association, 2005
- Biomedical term mapping databasesNucleic Acids Research, 2004
- SaRAD: a Simple and Robust Abbreviation DictionaryBioinformatics, 2004
- Mapping Abbreviations to Full Forms in Biomedical ArticlesJournal of the American Medical Informatics Association, 2002
- Heuristics for Identification of Acronym-Definition Patterns within Text: Towards an Automated Construction of Comprehensive Acronym-Definition DictionariesMethods of Information in Medicine, 2002
- Extracting useful terms from parenthetical expressions by combining simple rules and statistical measuresPublished by John Benjamins Publishing Company ,2001
- Recognizing acronyms and their definitionsInternational Journal on Document Analysis and Recognition (IJDAR), 1999
- The C-value/NC-value domain-independent method for multi-word term extractionJournal of Natural Language Processing, 1999
- An algorithm for suffix strippingProgram: electronic library and information systems, 1980