Onion design and its application to a pharmaceutical QSAR problem

Abstract
Statistical molecular design (SMD) is an efficient tool for selecting informative, representative and diverse sets of molecular structures to be used in conjunction with QSAR, combinatorial technologies and other areas of research depending on optimization of molecular properties. Onion design represents a recent addition to the plethora of designs encountered in the SMD toolbox. It is a flexible design approach relying on a combination of the best properties of other design families, notably the model support property of D‐optimal design and the uniform coverage ability of space‐filling design. The onion design splits the candidate set into a number of subsets (‘shells’ or ‘layers’), and a D‐optimal selection is made from each shell. This makes it possible to select representative sets of molecular structures throughout any property space with reasonable design sizes. The number of selected molecules is easily controlled by varying (i) the number of shells and (ii) the model on which the design is based. The applicability of onion design to a pharmaceutical QSAR problem is reported. The example data set contains 967 drug‐like molecules. The biological activity under investigation is the inhibition of the major human drug‐metabolizing enzyme cytochrome P450 3A4. Onion design is used to select an informative training set. QSAR modeling is accomplished by means of multivariate data analysis tools. Copyright © 2004 John Wiley & Sons, Ltd.