Prediction of Aqueous Solubility and Partition Coefficient Optimized by a Genetic Algorithm Based Descriptor Selection Method
- 1 May 2003
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 43 (3) , 1077-1084
- https://doi.org/10.1021/ci034006u
Abstract
The paper describes a fast and flexible descriptor selection method using a genetic algorithm variant (GA-SEC). The relevance of the descriptors will be measured using Shannon entropy (SE) and differential Shannon entropy (DSE), which have very sparse memory requirements and allow the processing of huge data sets. A small quantity of the most important descriptors will be used automatically to build a value prediction model. The most important descriptors are not a linear combination of other descriptors, but transparent, pure descriptors. We used an artificial neural network (ANN) model to predict the aqueous solubility logS and the octanol/water partition coefficient logP. The logS data set was divided into a training set of 1016 compounds and a test set of 253 compounds. A correlation coefficient of 0.93 and an empirical standard deviation of 0.54 were achieved. The logP data set was divided into a training set of 1853 compounds and a test set of 138 compounds. A correlation coefficient of 0.92 and an empirical standard deviation of 0.44 were achieved.Keywords
This publication has 26 references indexed in Scilit:
- A Consensus Neural Network-Based Technique for Discriminating Soluble and Poorly Soluble CompoundsJournal of Chemical Information and Computer Sciences, 2003
- Distance-Related Indexes in the Quantitative Structure−Property Relationship ModelingJournal of Chemical Information and Computer Sciences, 2001
- Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound SubsetsJournal of Chemical Information and Computer Sciences, 1999
- Clustering of Large Databases of Compounds: Using the MDL “Keys” as Structural DescriptorsJournal of Chemical Information and Computer Sciences, 1997
- Neighborhood Behavior: A Useful Concept for Validation of “Molecular Diversity” DescriptorsJournal of Medicinal Chemistry, 1996
- Comparison of Reliability of log P Values for Drugs Calculated by Several Methods.CHEMICAL & PHARMACEUTICAL BULLETIN, 1994
- Computer translation of IUPAC systematic organic chemical nomenclature. 1. Introduction and background to a grammar-based approachJournal of Chemical Information and Computer Sciences, 1989
- CHEMTEXTJournal of Chemical Information and Computer Sciences, 1988
- Water solubility and octanol/water partition coefficients of organics. Limitations of the solubility-partition coefficient correlationEnvironmental Science & Technology, 1980
- Algorithm 457: finding all cliques of an undirected graphCommunications of the ACM, 1973