Drug-like Annotation and Duplicate Analysis of a 23-Supplier Chemical Database Totalling 2.7 Million Compounds

Abstract
We have implemented five drug-like filters, based on 1D and 2D molecular descriptors, and applied them to characterize the drug-like properties of commercially available chemical compounds. In addition to previously published filters (Lipinski and Veber), we implemented a filter for medicinal chemistry tractability based on lists of chemical features drawn up by a panel of medicinal chemists. A filter based on the modeling of aqueous solubility (>1 μM) was derived in-house, as well as another based on the modeling of Caco-2 passive membrane permeability (>10 nm/s). A library of 2.7 million compounds was collated from the 23 compound suppliers and analyzed with these filters, highlighting a tendency toward highly lipophilic compounds. The library contains 1.6M unique structures, of which 37% (607 223) passed all five drug-like filters. None of the 23 suppliers provides all the members of the drug-like subset, emphasizing the benefit of considering compounds from various compound suppliers as a source of diversity for drug discovery.