Abstract
Candidates for identification of unknown constituents in a sample to be chemically analyzed are hypothetical. It is proposed to generate these hypotheses according to the co-occurrence of different chemical compounds with a known sample constituent in the chemical literature. The efficiency of the co-occurrence approach for predicting chemical compositions was tested for 67 impurities in 17 chemical/pharmaceutical products. The relative co-occurrence of impurity compounds and these products in the Chemical Abstracts Service database was evaluated and compared with corresponding values for several reference groups of probability sampled compounds from the literature. Almost all impurities (97%) and only ≤8% randomly sampled compounds co-occurred with these chemical products. Mean and median values of relative co-occurrence for impurities are much higher than those of probability sampled compounds which co-occurred with the products. For the combination of impurities and the probability sample of 396 interfering compounds, the power to predict the chemical composition using the highest co-occurrences is 0.49−0.59. The co-occurrence value can also be considered as an “empiric” indicator of chemical similarity useful to generate new hypotheses on relationships both between compounds and between compounds and their properties.

This publication has 19 references indexed in Scilit: