Exploring the boundaries: gene and protein identification in biomedical text
Open Access
- 24 May 2005
- journal article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 6 (S1) , 1-S5
- https://doi.org/10.1186/1471-2105-6-s1-s5
Abstract
Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. Conclusion: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.Keywords
This publication has 13 references indexed in Scilit:
- Using the Web to Obtain Frequencies for Unseen BigramsComputational Linguistics, 2003
- Named entity recognition with character-level modelsPublished by Association for Computational Linguistics (ACL) ,2003
- Language independent NER using a maximum entropy taggerPublished by Association for Computational Linguistics (ACL) ,2003
- A SIMPLE ALGORITHM FOR IDENTIFYING ABBREVIATION DEFINITIONS IN BIOMEDICAL TEXTPacific Symposium on Biocomputing, 2002
- Tagging gene and protein names in biomedical textBioinformatics, 2002
- Rutabaga by any other name: extracting biological namesJournal of Biomedical Informatics, 2002
- Tuning support vector machines for biomedical named entity recognitionPublished by Association for Computational Linguistics (ACL) ,2002
- Chunking with Support Vector Machines.Journal of Natural Language Processing, 2002
- Named Entity recognition without gazetteersPublished by Association for Computational Linguistics (ACL) ,1999
- Inducing features of random fieldsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1997