Using Recursive Partitioning to Analyze a Large Sar Data Set
- 1 January 1998
- journal article
- research article
- Published by Taylor & Francis in SAR and QSAR in Environmental Research
- Vol. 8 (3-4) , 183-193
- https://doi.org/10.1080/10629369808039140
Abstract
Large data sets are becoming much more prevalent in drug discovery as high throughput screening and combinatorial chemistry data sets become available. Conventional linear parametric methods, linear regression, PCR, PLS, etc., often fail in analyzing structurally heterogeneous data sets. This is because the underlying relationships may involve nonlinearities, thresholds and interactions, all of which considerably impede linear additive modeling approaches. Also the conventional assumption that all compounds in data set are acting by the same mechanism is not expected to hold. Recursive partitioning (RP) is able to accommodate all these difficulties; it is also computationally fast; RP can deal with very large data sets - 10 to 100k observations pose no particular problems. Therefore RP invites investigation as a general approach for study of structure activity relationships in large chemistry data sets. The purpose of this paper is to explicate a recursive partitioning procedure, FIRM, through the analysis of a large, structure-activity data set, 1650 compounds with 153 fragment descriptors and monoamine oxidase, MAO, activity. The methodology is successful in uncovering compounds acting through different mechanisms.Keywords
This publication has 4 references indexed in Scilit:
- Use of Structure−Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound SelectionJournal of Chemical Information and Computer Sciences, 1996
- Analysis of a 29 Full Factorial Chemical LibraryJournal of Medicinal Chemistry, 1995
- AUTOMATIC INTERACTION DETECTIONPublished by Cambridge University Press (CUP) ,1982
- Inflation of R 2 in Best Subset RegressionTechnometrics, 1980