COREPA‐M: A Multi‐Dimensional Formulation of COREPA

Abstract
Recently, the COmmon REactivity PAttern (COREPA) approach was developed as a probabilistic classification method which was formalized specifically to advance mechanistic QSAR development by addressing the impact of molecular flexibility on stereoelectronic properties of chemicals. In the initial version of COREPA, the probability distributions for only one stereoelectronic parameter at a time were analyzed for the series of chemicals under analysis. To go beyond considering probability distributions of one parameter at a time requires the capability of analyzing a suite of parameters simultaneously for each chemical. This work creates that capability for a multi‐dimensional formulation of the COREPA which is expected to enhance the reliability of the method to discriminate complex patterns. Using probability distance measures such as Kullback‐Leibler divergence and Hellinger distance, the set of parameters are defined that best discriminate activity. The COREPA‐M system automatically identifies the parameters that best discriminates chemicals in groups defined by comparable reactivity endpoints. A detailed Bayesian decision tree is then used for classifying untested chemicals with measures of “goodness of fit” criteria. COREPA‐M is illustrated using the example of modelling binding affinity of chemicals at the aryl hydrocarbon receptor.