Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA
- 1 October 1993
- journal article
- research article
- Published by Springer Nature in Journal of Computer-Aided Molecular Design
- Vol. 7 (5) , 587-619
- https://doi.org/10.1007/bf00124364
Abstract
Three-dimensional molecular modeling can provide an unlimited number m of structural properties. Comparative Molecular Field Analysis (CoMFA), for example, may calculate thousands of field values for each model structure. When m is large, partial least squares (PLS) is the statistical method of choice for fitting and predicting biological responses. Yet PLS is usually implemented in a property-based fashion which is optimal only for small m. We describe here a sample-based formulation of PLS which can be used to fit any single response (bioactivity). SAMPLS reduces all explanatory data to the pairwise ‘distances’ among n sample (molecules), or equivalently to an n-by-n covariance matrix C. This matrix, unmodified, can be used to fit all PLS components. Furthermore, SAMPLS will validate the model by modern resampling techniques, at a cost independent of m. We have implemented SAMPLS as a Fortran program and have reproduced conventional and cross-validated PLS analyses of data from two published studies. Full (leaveach-out) cross-validation of a typical CoMFA takes 0.2 CPU s. SAMPLS is thus ideally suited to structure-activity analysis based on CoMFA fields or bonded topology. The sample-distance formulation also relates PLS to methods like cluster analysis and nonlinear mapping, and shows how drastically PLS simplifies the information in CoMFA fields.Keywords
This publication has 30 references indexed in Scilit:
- Latent variables and cluster analysis of receptor protein sequencesTrAC Trends in Analytical Chemistry, 1991
- A new family of powerful multivariate statistical sequence analysis techniquesJournal of Molecular Biology, 1991
- Statistical Data Analysis in the Computer AgeScience, 1991
- DESIGN AND OPTIMIZATION OF CHEMICAL EXPERIMENTATION WITH THE NEW CHEMOMETRIC SYSTEM SPECTREAnalytical Sciences, 1991
- UNIPALS: Software for principal components analysis and partial least squares regressionTetrahedron Computer Methodology, 1989
- Principal components analysis and partial least squares regressionTetrahedron Computer Methodology, 1989
- On the use of some multivariate statistical methods in pharmacological researchJournal of Pharmacological Methods, 1986
- Partial least-squares regression: a tutorialAnalytica Chimica Acta, 1986
- Atom pairs as molecular features in structure-activity studies: definition and applicationsJournal of Chemical Information and Computer Sciences, 1985
- Projection Pursuit RegressionJournal of the American Statistical Association, 1981