Search for Predictive Generic Model of Aqueous Solubility Using Bayesian Neural Nets
- 13 September 2001
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 41 (6) , 1605-1616
- https://doi.org/10.1021/ci010363y
Abstract
Several predictive models of aqueous solubility have been published. They have good performances on the data sets which have been used for training the models, but usually these data sets do not contain many structures similar to the structures of interest to the drug research and their applicability in drug hunting is questionable. A very diverse data set has been gathered with compounds issued from literature reports and proprietary compounds. These compounds have been grouped in a so-called literature data set I, a proprietary data set II, and a mixed data set III formed by I and II. About 100 descriptors emphasizing surface properties were calculated for every compound. Bayesian learning of neural nets which cumulates the advantages of neural nets without having their weaknesses was used to select the most parsimonious models and train them, from I, II, and III. The models were established by either selecting the most efficient descriptors one by one using a modified Gram-Schmidt procedure (GS) or by simplifying a most complete model using automatic relevance procedure (ARD). The predictive ability of the models was accessed using validation data sets as much unrelated to the training sets as possible, using two new parameters: NDDx,ref the normalized smallest descriptor distance of a compound x to a reference data set and CDx,mod the combination of NDDx,ref with the dispersion of the Bayesian neural nets calculations. The results show that it is possible to obtain a generic predictive model from database I but that the diversity of database II is too restricted to give a model with good generalization ability and that the ARD method applied to the mixed database III gives the best predictive model.Keywords
This publication has 24 references indexed in Scilit:
- A smooth permittivity function for Poisson–Boltzmann solvation methodsJournal of Computational Chemistry, 2001
- Robust QSAR Models Using Bayesian Regularized Neural NetworksJournal of Medicinal Chemistry, 1999
- Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure−Activity Relationship StudiesJournal of Chemical Information and Computer Sciences, 1999
- New QSAR Methods Applied to Structure−Activity Mapping and Combinatorial ChemistryJournal of Chemical Information and Computer Sciences, 1998
- Can We Learn To Distinguish between “Drug-like” and “Nondrug-like” Molecules?Journal of Medicinal Chemistry, 1998
- Neural network studies. 1. Comparison of overfitting and overtrainingJournal of Chemical Information and Computer Sciences, 1995
- Amino Acid Side Chain Descriptors for Quantitative Structure-Activity Relationship Studies of Peptide AnalogsJournal of Medicinal Chemistry, 1995
- Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated AnnealingJournal of Chemical Information and Computer Sciences, 1995
- A method for calculation of the aqueous solubility of organic compounds by using new fragment solubility constants.CHEMICAL & PHARMACEUTICAL BULLETIN, 1986
- Eine einfache Korrelation zwischen Wasserlöslichkeit und Struktur von Kohlenwasserstoffen und HalogenkohlenwasserstoffenChemie Ingenieur Technik - CIT, 1965