Combinatorial QSAR of Ambergris Fragrance Compounds
- 5 February 2004
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 44 (2) , 582-595
- https://doi.org/10.1021/ci034203t
Abstract
A combinatorial quantitative structure−activity relationships (Combi-QSAR) approach has been developed and applied to a data set of 98 ambergris fragrance compounds with complex stereochemistry. The Combi-QSAR approach explores all possible combinations of different independent descriptor collections and various individual correlation methods to obtain statistically significant models with high internal (for the training set) and external (for the test set) accuracy. Seven different descriptor collections were generated with commercially available MOE, CoMFA, CoMMA, Dragon, VolSurf, and MolconnZ programs; we also included chirality topological descriptors recently developed in our laboratory (Golbraikh, A.; Bonchev, D.; Tropsha, A. J. Chem. Inf. Comput. Sci. 2001, 41, 147−158). CoMMA descriptors were used in combination with MOE descriptors. MolconnZ descriptors were used in combination with chirality descriptors. Each descriptor collection was combined individually with four correlation methods, including k-nearest neighbors (kNN) classification, Support Vector Machines (SVM), decision trees, and binary QSAR, giving rise to 28 different types of QSAR models. Multiple diverse and representative training and test sets were generated by the divisions of the original data set in two. Each model with high values of leave-one-out cross-validated correct classification rate for the training set was subjected to extensive internal and external validation to avoid overfitting and achieve reliable predictive power. Two validation techniques were employed, i.e., the randomization of the target property (in this case, odor intensity) also known as the Y-randomization test and the assessment of external prediction accuracy using test sets. We demonstrate that not every combination of the data modeling technique and the descriptor collection yields a validated and predictive QSAR model. kNN classification in combination with CoMFA descriptors was found to be the best QSAR approach overall since predictive models with correct classification rates for both training and test sets of 0.7 and higher were obtained for all divisions of the ambergris data set into the training and test sets. Many predictive QSAR models were also found using a combination of kNN classification method with other collections of descriptors. The combinatorial QSAR affords automation, computational efficiency, and higher probability of identifying significant QSAR models for experimental data sets than the traditional approaches that rely on a single QSAR method.Keywords
This publication has 48 references indexed in Scilit:
- Prediction of three-dimensional molecular structures using information from infrared spectraAnalytica Chimica Acta, 2000
- Deriving the 3D structure of organic molecules from their infrared spectraVibrational Spectroscopy, 1999
- On the odor of the enantiomers of Madrol®Chirality, 1997
- Compliance and Outcomes in Patients with AsthmaDrugs, 1996
- Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compoundsJournal of Chemical Information and Computer Sciences, 1992
- A comment on nomenclature and the unsaturated bondJournal of Chemical Information and Computer Sciences, 1991
- Configuration‐Odor Relationships in 5β‐AmbroxHelvetica Chimica Acta, 1990
- Isomer discrimination by topological information approachJournal of Computational Chemistry, 1981
- An Electron Diffraction Investigation of Formic Acid MonomerJournal of the American Chemical Society, 1947
- Raman Spectra of Aqueous Solutions of Potassium ThiocyanateJournal of the American Chemical Society, 1947