Automated Pharmacophore Identification for Large Chemical Data Sets

Abstract
The identification of three-dimensional pharmacophores from large, heterogeneous data sets is still an unsolved problem. We developed a novel program, SCAMPI (statistical classification of activities of molecules for pharmacophore identification), for this purpose by combining a fast conformation search with recursive partitioning, a data-mining technique, which can easily handle large data sets. The pharmacophore identification process is designed to run recursively, and the conformation spaces are resampled under the constraints of the evolving pharmacophore model. This program is capable of deriving pharmacophores from a data set of 1000−2000 compounds, with thousands of conformations generated for each compound and in less than 1 day of computational time. For two test data sets, the identified pharmacophores are consistent with the known results from the literature.