Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules
- 31 July 1999
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 39 (5) , 881-886
- https://doi.org/10.1021/ci990308d
Abstract
In an effort to identify biologically active molecules in compound databases, we have investigated similarity searching using short binary bit strings with a maximum of 54 bit positions. These “minifingerprints” (MFPs) were designed to account for the presence or absence of structural fragments and/or aromatic character, flexibility, and hydrogen-bonding capacity of molecules. MFP design was based on an analysis of distributions of molecular descriptors and structural fragments in two large compound collections. The performance of different MFPs and a reference fingerprint was tested by systematic “one-against-all” similarity searches of molecules in a database containing 364 compounds with different biological activities. For each fingerprint, the most effective similarity cutoff value was determined. An MFP accounting for only 32 structural fragments showed less than 2% false positive similarity matches and correctly assigned on average ∼40% of the compounds with the same biological activity to a query molecule. Inclusion of three numerical two-dimensional (2D) molecular descriptors increased the performance by 15%. This MFP performed better than a complex 2D fingerprint. At a similarity cutoff value of 0.85, the 2D fingerprint totally eliminated false positives but recognized less than 10% of the compounds within the same activity class.Keywords
This publication has 19 references indexed in Scilit:
- A Scoring Scheme for Discriminating between Drugs and NondrugsJournal of Medicinal Chemistry, 1998
- Can We Learn To Distinguish between “Drug-like” and “Nondrug-like” Molecules?Journal of Medicinal Chemistry, 1998
- Chemical Similarity SearchingJournal of Chemical Information and Computer Sciences, 1998
- Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery ResearchJournal of Chemical Information and Computer Sciences, 1998
- Computational methods in molecular diversity and combinatorial chemistryCurrent Opinion in Chemical Biology, 1998
- On the Properties of Bit String-Based Measures of Chemical SimilarityJournal of Chemical Information and Computer Sciences, 1998
- Experimental Designs for Selecting Molecules from Large Chemical DatabasesJournal of Chemical Information and Computer Sciences, 1997
- Use of Structure−Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound SelectionJournal of Chemical Information and Computer Sciences, 1996
- Synthesis and Applications of Small Molecule LibrariesChemical Reviews, 1996
- Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databasesJournal of Chemical Information and Computer Sciences, 1992