Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays
- 19 February 2008
- journal article
- research article
- Published by Springer Nature in Journal of Computer-Aided Molecular Design
- Vol. 22 (6-7) , 367-384
- https://doi.org/10.1007/s10822-008-9192-9
Abstract
Computational toxicology is emerging as an encouraging alternative to experimental testing. The Molecular Libraries Screening Center Network (MLSCN) as part of the NIH Molecular Libraries Roadmap has recently started generating large and diverse screening datasets, which are publicly available in PubChem. In this report, we investigate various aspects of developing computational models to predict cell toxicity based on cell proliferation screening data generated in the MLSCN. By capturing feature-based information in those datasets, such predictive models would be useful in evaluating cell-based screening results in general (for example from reporter assays) and could be used as an aid to identify and eliminate potentially undesired compounds. Specifically we present the results of random forest ensemble models developed using different cell proliferation datasets and highlight protocols to take into account their extremely imbalanced nature. Depending on the nature of the datasets and the descriptors employed we were able to achieve percentage correct classification rates between 70% and 85% on the prediction set, though the accuracy rate dropped significantly when the models were applied to in vivo data. In this context we also compare the MLSCN cell proliferation results with animal acute toxicity data to investigate to what extent animal toxicity can be correlated and potentially predicted by proliferation results. Finally, we present a visualization technique that allows one to compare a new dataset to the training set of the models to decide whether the new dataset may be reliably predicted.Keywords
This publication has 44 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Compound Cytotoxicity Profiling Using Quantitative High-Throughput ScreeningEnvironmental Health Perspectives, 2008
- A new approach to QSAR modelling of acute toxicity†SAR and QSAR in Environmental Research, 2007
- Persistent Binding of Ligands to the Aryl Hydrocarbon ReceptorToxicological Sciences, 2007
- QSTR Study of Small Organic Molecules AgainstTetrahymena pyriformisQSAR & Combinatorial Science, 2007
- PASS: identification of probable targets and mechanisms of toxicitySAR and QSAR in Environmental Research, 2007
- Acetylcholinesterase: Converting a vulnerable target to a template for antidotes and detection of inhibitor exposureToxicology, 2006
- Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSARJournal of Chemical Information and Computer Sciences, 2004
- Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated AnnealingJournal of Chemical Information and Computer Sciences, 1995
- Optimization by Simulated AnnealingScience, 1983