Normalizing biomedical terms by minimizing ambiguity and variability
Open Access
- 11 April 2008
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 9 (S3) , S2
- https://doi.org/10.1186/1471-2105-9-s3-s2
Abstract
One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach.Keywords
This publication has 22 references indexed in Scilit:
- A scalable machine-learning approach to recognize chemical names within large text databasesBMC Bioinformatics, 2006
- GENETAG: a tagged corpus for gene/protein named entity recognitionBMC Bioinformatics, 2005
- ProMiner: rule-based protein and gene entity recognitionBMC Bioinformatics, 2005
- ABNER: an open source tool for automatically tagging genes, proteins and other entity names in textBioinformatics, 2005
- Identification of related gene/protein names based on an HMM of name variationsComputational Biology and Chemistry, 2004
- Recognizing names in biomedical texts: a machine learning approachBioinformatics, 2004
- GENIA corpus—a semantically annotated corpus for bio-textminingBioinformatics, 2003
- Tagging gene and protein names in biomedical textBioinformatics, 2002
- Information extraction in molecular biologyBriefings in Bioinformatics, 2002
- Using BLAST for identifying gene and protein names in journal articlesGene, 2000