Various criteria in the evaluation of biomedical named entity recognition

Open Access

24 February 2006

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 7 (1) , 92
https://doi.org/10.1186/1471-2105-7-92

Abstract

Text mining in the biomedical domain is receiving increasing attention. A key component of this process is named entity recognition (NER). Generally speaking, two annotated corpora, GENIA and GENETAG, are most frequently used for training and testing biomedical named entity recognition (Bio-NER) systems. JNLPBA and BioCreAtIvE are two major Bio-NER tasks using these corpora. Both tasks take different approaches to corpus annotation and use different matching criteria to evaluate system performance. This paper details these differences and describes alternative criteria. We then examine the impact of different criteria and annotation schemes on system performance by retesting systems participated in the above two tasks. To analyze the difference between JNLPBA's and BioCreAtIvE's evaluation, we conduct Experiment 1 to evaluate the top four JNLPBA systems using BioCreAtIvE's classification scheme. We then compare them with the top four BioCreAtIvE systems. Among them, three systems participated in both tasks, and each has an F-score lower on JNLPBA than on BioCreAtIvE. In Experiment 2, we apply hypothesis testing and correlation coefficient to find alternatives to BioCreAtIvE's evaluation scheme. It shows that right-match and left-match criteria have no significant difference with BioCreAtIvE. In Experiment 3, we propose a customized relaxed-match criterion that uses right match and merges JNLPBA's five NE classes into two, which achieves an F-score of 81.5%. In Experiment 4, we evaluate a range of five matching criteria from loose to strict on the top JNLPBA system and examine the percentage of false negatives. Our experiment gives the relative change in precision, recall and F-score as matching criteria are relaxed. In many applications, biomedical NEs could have several acceptable tags, which might just differ in their left or right boundaries. However, most corpora annotate only one of them. In our experiment, we found that right match and left match can be appropriate alternatives to JNLPBA and BioCreAtIvE's matching criteria. In addition, our relaxed-match criterion demonstrates that users can define their own relaxed criteria that correspond more realistically to their application requirements.

Keywords

This publication has 20 references indexed in Scilit:

Recognition of protein/gene names from text using an ensemble of classifiers
BMC Bioinformatics, 2005
GENETAG: a tagged corpus for gene/protein named entity recognition
BMC Bioinformatics, 2005
Identifying gene and protein mentions in text using conditional random fields
BMC Bioinformatics, 2005
Exploring the boundaries: gene and protein identification in biomedical text
BMC Bioinformatics, 2005
BioCreAtIvE Task 1A: gene mention finding evaluation
BMC Bioinformatics, 2005
Evaluation of BioCreAtIvE assessment of task 2
BMC Bioinformatics, 2005
iProLINK: an integrated protein resource for literature mining
Computational Biology and Chemistry, 2004
Genomic channeling in bacterial cell division
Journal of Molecular Recognition, 2004
Classifying semantic relations in bioscience texts
Published by Association for Computational Linguistics (ACL) ,2004
Notions of correctness when evaluating protein name taggers
Published by Association for Computational Linguistics (ACL) ,2002