Assessing Model Fit by Cross-Validation
Top Cited Papers
- 24 January 2003
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 43 (2) , 579-586
- https://doi.org/10.1021/ci025626i
Abstract
When QSAR models are fitted, it is important to validate any fitted modelto check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing thisusing a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a large QSAR data set that when the available sample size is smallin the dozens or scores rather than the hundreds, holding a portion of it back for testing is wasteful, and that it is much better to use cross-validation, but ensure that this is done properly.Keywords
This publication has 13 references indexed in Scilit:
- Augmenting Scheffé Linear Mixture Models with Squared and/or Crossproduct TermsJournal of Quality Technology, 2002
- QSAR with Few Compounds and Many FeaturesJournal of Chemical Information and Computer Sciences, 2001
- Asymptotic optimality of full cross-validation for selecting linear regression modelsStatistics & Probability Letters, 1999
- Use of Topostructural, Topochemical, and Geometric Parameters in the Prediction of Vapor Pressure: A Hierarchical QSAR ApproachJournal of Chemical Information and Computer Sciences, 1997
- Linear Model Selection by Cross-validationJournal of the American Statistical Association, 1993
- The Jackknife, the Bootstrap and Other Resampling PlansPublished by Society for Industrial & Applied Mathematics (SIAM) ,1982
- Generalized Cross-Validation as a Method for Choosing a Good Ridge ParameterTechnometrics, 1979
- Asymptotics for and against cross-validationBiometrika, 1977
- The Predictive Sample Reuse Method with ApplicationsJournal of the American Statistical Association, 1975
- I. Problems and Designs of Cross-Validation 1Educational and Psychological Measurement, 1951