Automated Acquisition of Disease-Drug Knowledge from Biomedical and Clinical Documents: An Initial Study

Top Cited Papers

Open Access

1 January 2008

journal article
Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association

Vol. 15 (1) , 87-98
https://doi.org/10.1197/jamia.m2401

Abstract

Objective: Explore the automated acquisition of knowledge in biomedical and clinical documents using text mining and statistical techniques to identify disease-drug associations. Design: Biomedical literature and clinical narratives from the patient record were mined to gather knowledge about disease-drug associations. Two NLP systems, BioMedLEE and MedLEE, were applied to Medline articles and discharge summaries, respectively. Disease and drug entities were identified using the NLP systems in addition to MeSH annotations for the Medline articles. Focusing on eight diseases, co-occurrence statistics were applied to compute and evaluate the strength of association between each disease and relevant drugs. Results: Ranked lists of disease-drug pairs were generated and cutoffs calculated for identifying stronger associations among these pairs for further analysis. Differences and similarities between the text sources (i.e., biomedical literature and patient record) and annotations (i.e., MeSH and NLP-extracted UMLS concepts) with regards to disease-drug knowledge were observed. Conclusion: This paper presents a method for acquiring disease-specific knowledge and a feasibility study of the method. The method is based on applying a combination of NLP and statistical techniques to both biomedical and clinical documents. The approach enabled extraction of knowledge about the drugs clinicians are using for patients with specific diseases based on the patient record, while it is also acquired knowledge of drugs frequently involved in controlled trials for those same diseases. In comparing the disease-drug associations, we found the results to be appropriate: the two text sources contained consistent as well as complementary knowledge, and manual review of the top five disease-drug associations by a medical expert supported their correctness across the diseases.

Keywords

This publication has 47 references indexed in Scilit:

A statistical methodology for analyzing co-occurrence data from a large sample
Journal of Biomedical Informatics, 2006
Recent advances in natural language processing for biomedical applications
International Journal of Medical Informatics, 2006
Databases for knowledge discovery: Examples from biomedicine and health care
International Journal of Medical Informatics, 2006
Text-mining approaches in molecular biology and biomedicine
Drug Discovery Today, 2005
Mining MEDLINE for implicit links between dietary substances and diseases
Bioinformatics, 2004
Methods for automated concept mapping between medical databases
Journal of Biomedical Informatics, 2004
Knowledge discovery by automated identification and ranking of implicit relationships
Bioinformatics, 2004
The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text
Published by Elsevier ,2004
Mining the Biomedical Literature in the Genomic Era: An Overview
Journal of Computational Biology, 2003
Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide
Journal of the American Medical Informatics Association, 2003