A maximum common substructure-based algorithm for searching and predicting drug-like compounds
Open Access
- 1 July 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (13) , i366-i374
- https://doi.org/10.1093/bioinformatics/btn186
Abstract
Motivation: The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. Many computational methods in this area focus on measuring the structural similarities between chemical structures. However, traditional similarity measures are often too rigid or consider only global similarities between structures. The maximum common substructure (MCS) approach provides a more promising and flexible alternative for predicting bioactive compounds. Results: In this article, a new backtracking algorithm for MCS is proposed and compared to global similarity measurements. Our algorithm provides high flexibility in the matching process, and it is very efficient in identifying local structural similarities. To predict and cluster biologically active compounds more efficiently, the concept of basis compounds is proposed that enables researchers to easily combine the MCS-based and traditional similarity measures with modern machine learning techniques. Support vector machines (SVMs) are used to test how the MCS-based similarity measure and the basis compound vectorization method perform on two empirically tested datasets. The test results show that MCS complements the well-known atom pair descriptor-based similarity measure. By combining these two measures, our SVM-based model predicts the biological activities of chemical compounds with higher specificity and sensitivity. Contact:ycao@cs.ucr.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 37 references indexed in Scilit:
- Structure-based maximal affinity model predicts small-molecule druggabilityNature Biotechnology, 2007
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2006
- Chemical space and biologyNature, 2004
- RASCAL: Calculation of Graph Similarity using Maximum Common Edge SubgraphsThe Computer Journal, 2002
- Structural graph matching using the EM algorithm and singular value decompositionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2001
- Enumerating all connected maximal common subgraphs in two graphsTheoretical Computer Science, 2000
- Structural matching by discrete relaxationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1997
- Using Artificial Neural Networks to Predict Biological Activity from Simple Molecular Structural ConsiderationsQuantitative Structure-Activity Relationships, 1996
- Atom pairs as molecular features in structure-activity studies: definition and applicationsJournal of Chemical Information and Computer Sciences, 1985
- Backtrack search algorithms and the maximal common subgraph problemSoftware: Practice and Experience, 1982