Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining
Open Access
- 23 March 2010
- journal article
- Published by Springer Nature in Journal of Cheminformatics
- Vol. 2 (1) , 3
- https://doi.org/10.1186/1758-2946-2-3
Abstract
Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships.Keywords
This publication has 20 references indexed in Scilit:
- A dictionary to identify small molecules and drugs in free textBioinformatics, 2009
- Cascaded classifiers for confidence-based chemical named entity recognitionBMC Bioinformatics, 2008
- HMDB: a knowledgebase for the human metabolomeNucleic Acids Research, 2008
- Drug name recognition and classification in biomedical textsDrug Discovery Today, 2008
- Detection of IUPAC and IUPAC-like chemical namesBioinformatics, 2008
- A perspective of publicly accessible/open-access chemistry databasesDrug Discovery Today, 2008
- ChEBI: a database and ontology for chemical entities of biological interestNucleic Acids Research, 2007
- DrugBank: a knowledgebase for drugs, drug actions and drug targetsNucleic Acids Research, 2007
- Mining chemical structural information from the drug literatureDrug Discovery Today, 2006
- Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical NomenclatureJournal of Chemical Information and Computer Sciences, 1999