A complete small molecule dataset from the protein data bank
- 17 February 2006
- journal article
- Published by Wiley in FEBS Letters
- Vol. 580 (6) , 1649-1653
- https://doi.org/10.1016/j.febslet.2006.02.003
Abstract
A complete set of 6300 small molecule ligands was extracted from the protein data bank, and deposited online in PubChem as data source ‘SMID’. This set’s major improvement over prior methods is the inclusion of cyclic polypeptides and branched polysaccharides, including an unambiguous nomenclature, in addition to normal monomeric ligands. Only the best available example of each ligand structure is retained, and an additional dataset is maintained containing co-ordinates for all examples of each structure. Attempts are made to correct ambiguous atomic elements and other common errors, and a perception algorithm was used to determine bond order and aromaticity when no other information was availableKeywords
This publication has 14 references indexed in Scilit:
- On the Perception of Molecules from 3D Atomic CoordinatesJournal of Chemical Information and Modeling, 2005
- The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schemaNucleic Acids Research, 2004
- The Biomolecular Interaction Network Database and related tools 2005 updateNucleic Acids Research, 2004
- PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acidsNucleic Acids Research, 2004
- PRODRG: a tool for high-throughput crystallography of protein–ligand complexesActa Crystallographica Section D-Biological Crystallography, 2004
- BALI: Automatic Assignment of Bond and Atom Types for Protein Ligands in the Brookhaven Protein DatabankJournal of Chemical Information and Computer Sciences, 1997
- Comparison of conformations of small molecule structures from the Protein Data Bank with those generated by Concord, Cobra, ChemDBS-3D, and Converter and those extracted from the Cambridge Structural DatabaseJournal of Chemical Information and Computer Sciences, 1993
- Automatic assignment of chemical connectivity to organic molecules in the Cambridge Structural DatabaseJournal of Chemical Information and Computer Sciences, 1992
- Description of several chemical structure file formats used by computer programs developed at Molecular Design LimitedJournal of Chemical Information and Computer Sciences, 1992
- SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rulesJournal of Chemical Information and Computer Sciences, 1988