Extracting and characterizing gene???drug relationships from the literature
- 1 September 2004
- journal article
- research article
- Published by Wolters Kluwer Health in Pharmacogenetics
- Vol. 14 (9) , 577-586
- https://doi.org/10.1097/00008571-200409000-00002
Abstract
A fundamental task of pharmacogenetics is to collect and classify relationships between genes and drugs. Currently, this useful information has not been comprehensively aggregated in any database and remains scattered throughout the published literature. Although there are efforts to collect this information manually, they are limited by the size of the published literature on gene–drug relationships. Therefore, we investigated computational methods to extract and characterize pharmacogenetic relationships between genes and drugs from the literature. We first evaluated the effectiveness of the co-occurrence method in identifying related genes and drugs. We then used supervised machine learning algorithms to classify the relationships between genes and drugs from the Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) into five categories that have been defined by active pharmacogenetic researchers as relevant to their work. The final co-occurrence algorithm was able to extract 78% of the related genes and drugs that were published in a review article from the literature. Our algorithm subsequently classified the relationships between genes and drugs from the PharmGKB into five categories with 74% accuracy. We have made the data available on a supplementary website at http://bionlp.stanford.edu/genedrug/ Gene–drug relationships can be accurately extracted from text and classified into categories. Although the relationships that we have identified do not capture the details and fine distinctions often made in the literature, these methods will help scientists to track the ever-growing literature and create information resources to support future discoveries.Keywords
This publication has 19 references indexed in Scilit:
- PharmGKB: the pharmacogenetics and pharmacogenomics knowledge baseThe Pharmacogenomics Journal, 2004
- Accomplishments and challenges in literature data mining for biologyBioinformatics, 2002
- Increased frequency of cytochrome P450 2D6 poor metabolizers among patients with metoprolol‐associated adverse effectsClinical Pharmacology & Therapeutics, 2002
- Pharmacokinetics of losartan and its metabolite E‐3174 in relation to the CYP2C9 genotypeClinical Pharmacology & Therapeutics, 2002
- PharmGKB: the Pharmacogenetics Knowledge BaseNucleic Acids Research, 2002
- Integrating genotype and phenotype information: an overview of the PharmGKB projectThe Pharmacogenomics Journal, 2001
- Automated extraction of information on protein–protein interactions from the biological literatureBioinformatics, 2001
- Automated extraction of information in molecular biologyFEBS Letters, 2000
- The Apolipoprotein ε4 Allele Determines Prognosis and the Effect on Prognosis of Simvastatin in Survivors of Myocardial InfarctionCirculation, 2000
- Regulation of Nerve Growth Mediated by Inositol 1,4,5-Trisphosphate Receptors in Growth ConesScience, 1998