Abstract
Recursive Partitioning (RP) was used to analyze a heterogeneous data set of μ receptor High Throughput Screening results of combinatorial libraries, lead optimization products and reference opiate ligands from literature. Different sets of molecular descriptors and various parameterization schemes have been systematically assessed in search for an optimal RP tree, best discriminating between μ receptor ligands and inactive molecules. This discriminating ability has been evaluated in terms of a quality criterion (representing an enrichment factor corrected by the retrieval rate of active compounds), for both the learning set - with and without performing a cross-validation test-and a distinct validation set. The non-linearity of the approach, as well as the very large number of degrees of freedom of the models, render the statistical analysis - in particular, the detection of overfitting - quite difficult. The advantages and disadvantages of the RP approach are discussed on hand of the comparative analysis of the performances of the models under the studied conditions. Eventually, the features highlighted by the RP model as essential sources for the activity of compounds with respect to the μ receptor are analyzed in the light of commonly accepted μ receptor binding hypotheses. It is shown that the RP model considered to be “optimal” due to its simultaneous success in the cross-validation and validation simulations has been able to “discover” the existence of the two main (“morphine-like” and “meperidine-like”) μ ligand families, represented as the two main “active nodes” of this tree.