Automated Acquisition of Disease-Drug Knowledge from Biomedical and Clinical Documents: An Initial Study
Top Cited Papers
Open Access
- 1 January 2008
- journal article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 15 (1) , 87-98
- https://doi.org/10.1197/jamia.m2401
Abstract
Objective: Explore the automated acquisition of knowledge in biomedical and clinical documents using text mining and statistical techniques to identify disease-drug associations. Design: Biomedical literature and clinical narratives from the patient record were mined to gather knowledge about disease-drug associations. Two NLP systems, BioMedLEE and MedLEE, were applied to Medline articles and discharge summaries, respectively. Disease and drug entities were identified using the NLP systems in addition to MeSH annotations for the Medline articles. Focusing on eight diseases, co-occurrence statistics were applied to compute and evaluate the strength of association between each disease and relevant drugs. Results: Ranked lists of disease-drug pairs were generated and cutoffs calculated for identifying stronger associations among these pairs for further analysis. Differences and similarities between the text sources (i.e., biomedical literature and patient record) and annotations (i.e., MeSH and NLP-extracted UMLS concepts) with regards to disease-drug knowledge were observed. Conclusion: This paper presents a method for acquiring disease-specific knowledge and a feasibility study of the method. The method is based on applying a combination of NLP and statistical techniques to both biomedical and clinical documents. The approach enabled extraction of knowledge about the drugs clinicians are using for patients with specific diseases based on the patient record, while it is also acquired knowledge of drugs frequently involved in controlled trials for those same diseases. In comparing the disease-drug associations, we found the results to be appropriate: the two text sources contained consistent as well as complementary knowledge, and manual review of the top five disease-drug associations by a medical expert supported their correctness across the diseases.Keywords
This publication has 47 references indexed in Scilit:
- A statistical methodology for analyzing co-occurrence data from a large sampleJournal of Biomedical Informatics, 2006
- Recent advances in natural language processing for biomedical applicationsInternational Journal of Medical Informatics, 2006
- Databases for knowledge discovery: Examples from biomedicine and health careInternational Journal of Medical Informatics, 2006
- Text-mining approaches in molecular biology and biomedicineDrug Discovery Today, 2005
- Mining MEDLINE for implicit links between dietary substances and diseasesBioinformatics, 2004
- Methods for automated concept mapping between medical databasesJournal of Biomedical Informatics, 2004
- Knowledge discovery by automated identification and ranking of implicit relationshipsBioinformatics, 2004
- The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical textPublished by Elsevier ,2004
- Mining the Biomedical Literature in the Genomic Era: An OverviewJournal of Computational Biology, 2003
- Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for ThalidomideJournal of the American Medical Informatics Association, 2003