Literature-Based Generation of Hypotheses on Chemical Composition Using Database Co-occurrence of Chemical Compounds

29 July 2005

journal article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling

Vol. 45 (5) , 1153-1158
https://doi.org/10.1021/ci049716u

Abstract

Candidates for identification of unknown constituents in a sample to be chemically analyzed are hypothetical. It is proposed to generate these hypotheses according to the co-occurrence of different chemical compounds with a known sample constituent in the chemical literature. The efficiency of the co-occurrence approach for predicting chemical compositions was tested for 67 impurities in 17 chemical/pharmaceutical products. The relative co-occurrence of impurity compounds and these products in the Chemical Abstracts Service database was evaluated and compared with corresponding values for several reference groups of probability sampled compounds from the literature. Almost all impurities (97%) and only ≤8% randomly sampled compounds co-occurred with these chemical products. Mean and median values of relative co-occurrence for impurities are much higher than those of probability sampled compounds which co-occurred with the products. For the combination of impurities and the probability sample of 396 interfering compounds, the power to predict the chemical composition using the highest co-occurrences is 0.49−0.59. The co-occurrence value can also be considered as an “empiric” indicator of chemical similarity useful to generate new hypotheses on relationships both between compounds and between compounds and their properties.

Keywords

This publication has 19 references indexed in Scilit:

Using concepts in literature‐based discovery: Simulating Swanson's Raynaud–fish oil and migraine–magnesium discoveries
Journal of the American Society for Information Science and Technology, 2001
Determination of oxytetracycline and some impurities in plasma by non-aqueous capillary electrophoresis using solid-phase extraction
Chromatographia, 2000
Identification of chemical substances by testing and screening of hypotheses
Analytical and Bioanalytical Chemistry, 2000
Determination of 1-benzo[b]thien-2-ylethanone and related impurities by high performance liquid chromatography
Journal of Pharmaceutical and Biomedical Analysis, 1996
High-performance liquid chromatographic separation and determination of small amounts of process impurities of ciprofloxacin in bulk drugs and formulations
Journal of Chromatography A, 1995
Evaluation of different injection techniques in the gas chromatographic determination of thermolabile trace impurities in a drug substance
Journal of Pharmaceutical and Biomedical Analysis, 1995
Stability-indicating method for the determination of levodopa, levodopa—carbidopa and related impurities
Journal of Chromatography A, 1994
HPLC Determination of Oxiracetam, Its Impurities, and Piracetam in Pharmaceutical Formulations
Analytical Letters, 1994
Definition and role of similarity concepts in the chemical and physical sciences
Journal of Chemical Information and Computer Sciences, 1992
Co‐citation in the scientific literature: A new measure of the relationship between two documents
Journal of the American Society for Information Science, 1973