Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

Open Access

23 March 2010

journal article
Published by Springer Nature in Journal of Cheminformatics

Vol. 2 (1) , 3
https://doi.org/10.1186/1758-2946-2-3

Abstract

Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships.

Keywords

This publication has 20 references indexed in Scilit:

A dictionary to identify small molecules and drugs in free text
Bioinformatics, 2009
Cascaded classifiers for confidence-based chemical named entity recognition
BMC Bioinformatics, 2008
HMDB: a knowledgebase for the human metabolome
Nucleic Acids Research, 2008
Drug name recognition and classification in biomedical texts
Drug Discovery Today, 2008
Detection of IUPAC and IUPAC-like chemical names
Bioinformatics, 2008
A perspective of publicly accessible/open-access chemistry databases
Drug Discovery Today, 2008
ChEBI: a database and ontology for chemical entities of biological interest
Nucleic Acids Research, 2007
DrugBank: a knowledgebase for drugs, drug actions and drug targets
Nucleic Acids Research, 2007
Mining chemical structural information from the drug literature
Drug Discovery Today, 2006
Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical Nomenclature
Journal of Chemical Information and Computer Sciences, 1999