Automated Descriptor Selection and Hyperstructure Generation to Assist SAR Studies

Abstract
This paper summarises the results obtained from recent research at the University of Sheffield into the identification of topological pharmacophores and toxicophores. The published CASE and LOGANA methods were implemented, and several limitations revealed. The CASE method was improved by the use of atom pairs as descriptors. The LOGANA method was improved by the use of hyperstructures to reduce the number of variables available for combination. The paper outlines improvements to the method for hyperstructure generation, and the advantages of hyperstructures for data analysis, compression and visualisation. In the resultant scheme of operation, large datasets can be clustered on the basis of very general fragments, and then hyperstructures of the clusters input to LOGANA to highlight activating and inactivating substructures.