Using Recursive Partitioning to Analyze a Large Sar Data Set

Abstract
Large data sets are becoming much more prevalent in drug discovery as high throughput screening and combinatorial chemistry data sets become available. Conventional linear parametric methods, linear regression, PCR, PLS, etc., often fail in analyzing structurally heterogeneous data sets. This is because the underlying relationships may involve nonlinearities, thresholds and interactions, all of which considerably impede linear additive modeling approaches. Also the conventional assumption that all compounds in data set are acting by the same mechanism is not expected to hold. Recursive partitioning (RP) is able to accommodate all these difficulties; it is also computationally fast; RP can deal with very large data sets - 10 to 100k observations pose no particular problems. Therefore RP invites investigation as a general approach for study of structure activity relationships in large chemistry data sets. The purpose of this paper is to explicate a recursive partitioning procedure, FIRM, through the analysis of a large, structure-activity data set, 1650 compounds with 153 fragment descriptors and monoamine oxidase, MAO, activity. The methodology is successful in uncovering compounds acting through different mechanisms.

This publication has 4 references indexed in Scilit: