Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery
Top Cited Papers
- 30 January 2007
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling
- Vol. 47 (2) , 342-353
- https://doi.org/10.1021/ci600423u
Abstract
All molecules of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chemical stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds. By comparison, only 63 857 compounds of up to 11 atoms were found in public databases (a combination of PubChem, ChemACX, ChemSCX, NCI open database, and the Merck Index). A total of 538 of the 1208 ring systems in GDB are currently unknown in the CAS Registry and Beilstein databases in any carbon/heteroatom/multiple-bond combination or as a substructure. Over 70% of GDB molecules are chiral. Because of their small size, all compounds obey Lipinski's bioavailability rule. A total of 13.2 million compounds also follow Congreve's “Rule of 3” for lead-likeness. A Kohonen map trained with autocorrelation descriptors organizes GDB according to compound classes and shows that leadlike compounds are most abundant in chiral regions of fused carbocycles and fused heterocycles. The projection of known compounds into this map indicates large uncharted areas of chemical space. The potential of GDB for drug discovery is illustrated by virtual screening for kinase inhibitors, G-protein coupled receptor ligands, and ion-channel modulators. The database is available from the author's Web page.Keywords
This publication has 30 references indexed in Scilit:
- Discovery of protein phosphatase inhibitor classes by biology-oriented synthesisProceedings of the National Academy of Sciences, 2006
- Diversity in Medicinal Chemistry SpaceCurrent Topics in Medicinal Chemistry, 2006
- Natural product-like chemical space: search for chemical dissectors of macromolecular interactionsCurrent Opinion in Chemical Biology, 2005
- Assessment of structural diversity in combinatorial synthesisCurrent Opinion in Chemical Biology, 2005
- Small-molecule natural products: new structures, new activitiesCurrent Opinion in Biotechnology, 2004
- Hit and lead generation: beyond high-throughput screeningNature Reviews Drug Discovery, 2003
- Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric GroupsJournal of Chemical Information and Computer Sciences, 2002
- The art and practice of structure-based drug design: A molecular modeling perspectiveMedicinal Research Reviews, 1996
- Dendral and meta-dendral: Their applications dimensionArtificial Intelligence, 1978
- TOPOLOGICAL MAPPING OF ORGANIC MOLECULESProceedings of the National Academy of Sciences, 1965