Prediction of Protein Retention Times in Anion-Exchange Chromatography Systems Using Support Vector Regression
- 15 October 2002
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 42 (6) , 1347-1357
- https://doi.org/10.1021/ci025580t
Abstract
Quantitative Structure-Retention Relationship (QSRR) models are developed for the prediction of protein retention times in anion-exchange chromatography systems. Topological, subdivided surface area, and TAE (Transferable Atom Equivalent) electron-density-based descriptors are computed directly for a set of proteins using molecular connectivity patterns and crystal structure geometries. A novel algorithm based on Support Vector Machine (SVM) regression has been employed to obtain predictive QSRR models using a two-step computational strategy. In the first step, a sparse linear SVM was utilized as a feature selection procedure to remove irrelevant or redundant information. Subsequently, the selected features were used to produce an ensemble of nonlinear SVM regression models that were combined using bootstrap aggregation (bagging) techniques, where various combinations of training and validation data sets were selected from the pool of available data. A visualization scheme (star plots) was used to display the relative importance of each selected descriptor in the final set of “bagged” models. Once these predictive models have been validated, they can be used as an automated prediction tool for virtual high-throughput screening (VHTS).Keywords
This publication has 30 references indexed in Scilit:
- A Natural Product That Lowers Cholesterol As an Antagonist Ligand for FXRScience, 2002
- Theoretical Reconstruction of the Electron Density of Large Molecules from Fragments Determined as Proper Open Quantum Systems: The Properties of the Oripavine PEO, Enkephalins, and MorphineThe Journal of Physical Chemistry A, 2001
- New Support Vector AlgorithmsNeural Computation, 2000
- The Protein Data BankNucleic Acids Research, 2000
- Novel Variable Selection Quantitative Structure−Property Relationship Approach Based on thek-Nearest-Neighbor PrincipleJournal of Chemical Information and Computer Sciences, 1999
- An overview of statistical learning theoryIEEE Transactions on Neural Networks, 1999
- Prediction of gas chromatographic retention indices of alkylbenzenesAnalytica Chimica Acta, 1997
- QSPR analysis of HPLC column capacity factors for a set of high-energy materials using electronic van der waals surface property descriptors computed by transferable atom equivalent methodJournal of Computational Chemistry, 1997
- Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated AnnealingJournal of Chemical Information and Computer Sciences, 1995
- Variable Selection in QSAR Studies. II. A Highly Efficient Combination of Systematic Search and EvolutionQuantitative Structure-Activity Relationships, 1994