Prediction of Aqueous Solubility of a Diverse Set of Compounds Using Quantitative Structure−Property Relationships

23 July 2003

journal article
research article
Published by American Chemical Society (ACS) in Journal of Medicinal Chemistry

Vol. 46 (17) , 3572-3580
https://doi.org/10.1021/jm020266b

Abstract

“Fail early and fail fast” is the current paradigm that the pharmaceutical industry has adopted widely. Removing non-drug-like compounds from the drug discovery lifecycle in the early stages can lead to tremendous savings of resources. Thus, fast screening methods are needed to profile the large collection of synthesized and virtual libraries involved in the early stage. Solubility is one of the filters that are applied extensively to ensure that the compounds are reasonably soluble so that synthesis of the compounds and assay studies of pharmacokinetics and toxicity are feasible. To address this need, we have developed a fast quantitative structure−property relationship (QSPR) model for the prediction of aqueous solubility (at 298 K, unbuffered solution) from the molecular structures. Multiple linear regressions and genetic algorithms were used to develop the models. The model was based on a set of diverse compounds including small organic molecules and drug and drug-like species. The predicted solubility for the training and test sets agrees well with the experimental values. The coefficient of determination is R² = 0.84 for the training set of 775 compounds and the RMS error = 0.87. This model was validated on four sets of compounds. The RMS error for the 1665 compounds from the four validation data sets (including compounds from the Physician's Desk References and Comprehensive Medicinal Chemistry databases) is 1 log unit and the unsigned error is 0.77. This model does not require 3-D structure generation which is rather time-consuming. Using 2-D structure as input, this model is able to compute solubility for 90 000−700 000 compounds/h on a SGI Origin 2000 workstation. This kind of fast calculation allows the model to be used in data mining and screening of large synthesized or virtual libraries.

Keywords

This publication has 23 references indexed in Scilit:

One-Dimensional Molecular Representations and Similarity Calculations: Methodology and Validation
Journal of Medicinal Chemistry, 2001
Prediction of Aqueous Solubility of Heteroatom-Containing Organic Compounds from Molecular Structure
Journal of Chemical Information and Computer Sciences, 2001
Property-Based Design: Optimization of Drug Absorption and Pharmacokinetics
Journal of Medicinal Chemistry, 2001
Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings 1PII of original article: S0169-409X(96)00423-1. The article was originally published in Advanced Drug Delivery Reviews 23 (1997) 3–25. 1
Advanced Drug Delivery Reviews, 2001
Computational methods for the prediction of ‘drug-likeness’
Published by Elsevier ,2000
Correlation of the Aqueous Solubility of Hydrocarbons and Halogenated Hydrocarbons with Molecular Structure
Journal of Chemical Information and Computer Sciences, 1998
Prediction of Aqueous Solubility for a Diverse Set of Heteroatom-Containing Organic Compounds Using a Quantitative Structure−Property Relationship
Journal of Chemical Information and Computer Sciences, 1996
Success rates for new drugs entering clinical testing in the United States
Clinical Pharmacology & Therapeutics, 1995
Prediction of Aqueous Solubility of Organic Compounds
Journal of Chemical Information and Computer Sciences, 1994
Pharmaceutical innovation by the seven UK‐owned pharmaceutical companies (1964‐1985).
British Journal of Clinical Pharmacology, 1988