The Better Predictive Model: High q² for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

18 April 2005

journal article
research article
Published by Wiley in QSAR & Combinatorial Science

Vol. 24 (3) , 385-396
https://doi.org/10.1002/qsar.200430909

Abstract

The process of validation of computational models (e.g., QSARs) may become the most important step in their development. Different requirements for the reliability and predictability of QSAR models have been described in the literature. Despite these formal recommendations there are few practical rules as to when to cease adding variables to a QSAR (i.e., what is an appropriate level of complexity of the model). In this work the influence of model complexity to statistical fit and error have been investigated using toxicity data for 200 phenols to the ciliated protozoan Tetrahymena pyriformis when applying a test set of a further 50 compounds. The results from this investigation showed that some important factors play a role in the definition of a good and reliable QSAR. These include the fact that q² is not a good criterion for a model predictivity; that outliers should not necessarily be deleted as this may reduce the chemical space of the model; the number of descriptors in a multivariate model should be chosen carefully to avoid model under‐ and over‐estimation; and that an appropriate number of dimensions is required for PLS modelling.

Keywords

This publication has 21 references indexed in Scilit:

The role of the European centre for the validation of alternative methods (ECVAM) in the validation of (Q)SARs
SAR and QSAR in Environmental Research, 2004
Evaluation of QSARs for ecotoxicity: A method for assigning quality and confidence
SAR and QSAR in Environmental Research, 2004
Stepwise Discrimination between Four Modes of Toxic Action of Phenols in the Tetrahymena pyriformis Assay
Chemical Research in Toxicology, 2003
The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models
QSAR & Combinatorial Science, 2003
Predicting modes of toxic action from chemical structure: Acute toxicity in the fathead minnow (Pimephales promelas)
Environmental Toxicology and Chemistry, 1997
TETRATOX: TETRAHYMENA PYRIFORMIS POPULATION GROWTH IMPAIRMENT ENDPOINTA SURROGATE FOR FISH LETHALITY
Toxicology Mechanisms and Methods, 1997
From Complexity to Perplexity
Scientific American, 1995
Variable Selection in QSAR Studies. I. An Evolutionary Algorithm
Quantitative Structure-Activity Relationships, 1994
Model building in structure-activity relations. Reexamination of adrenergic blocking activity of .beta.-halo-.beta.-arylalkylamines
Journal of Medicinal Chemistry, 1973
Chance correlations in structure-activity studies using multiple regression analysis
Journal of Medicinal Chemistry, 1972

The Better Predictive Model: High q2 for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?

Abstract

Keywords

The Better Predictive Model: High q² for the Training Set or Low Root Mean Square Error of Prediction for the Test Set?