Using Recursive Partitioning to Analyze a Large Sar Data Set

1 January 1998

journal article
research article
Published by Taylor & Francis in SAR and QSAR in Environmental Research

Vol. 8 (3-4) , 183-193
https://doi.org/10.1080/10629369808039140

Abstract

Large data sets are becoming much more prevalent in drug discovery as high throughput screening and combinatorial chemistry data sets become available. Conventional linear parametric methods, linear regression, PCR, PLS, etc., often fail in analyzing structurally heterogeneous data sets. This is because the underlying relationships may involve nonlinearities, thresholds and interactions, all of which considerably impede linear additive modeling approaches. Also the conventional assumption that all compounds in data set are acting by the same mechanism is not expected to hold. Recursive partitioning (RP) is able to accommodate all these difficulties; it is also computationally fast; RP can deal with very large data sets - 10 to 100k observations pose no particular problems. Therefore RP invites investigation as a general approach for study of structure activity relationships in large chemistry data sets. The purpose of this paper is to explicate a recursive partitioning procedure, FIRM, through the analysis of a large, structure-activity data set, 1650 compounds with 153 fragment descriptors and monoamine oxidase, MAO, activity. The methodology is successful in uncovering compounds acting through different mechanisms.

Keywords

This publication has 4 references indexed in Scilit:

Use of Structure−Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection
Journal of Chemical Information and Computer Sciences, 1996
Analysis of a 29 Full Factorial Chemical Library
Journal of Medicinal Chemistry, 1995
AUTOMATIC INTERACTION DETECTION
Published by Cambridge University Press (CUP) ,1982
Inflation of R 2 in Best Subset Regression
Technometrics, 1980