Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution
- 17 February 2009
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Modeling
- Vol. 49 (3) , 740-743
- https://doi.org/10.1021/ci800067r
Abstract
Until recently most scientific and patent documents dealing with chemistry have described molecular structures either with systematic names or with graphical images of Kekulé structures. The latter method poses inherent problems in the automated processing that is needed when the number of documents ranges in the hundreds of thousands or even millions since graphical representations cannot be directly interpreted by a computer. To recover this structural information, which is otherwise all but lost, we have built an optical structure recognition application based on modern advances in image processing implemented in open source tools, OSRA. OSRA can read documents in over 90 graphical formats including GIF, JPEG, PNG, TIFF, PDF, and PS, automatically recognizes and extracts the graphical information representing chemical structures in such documents, and generates the SMILES or SD representation of the encountered molecular structure images.This publication has 7 references indexed in Scilit:
- Internet resources integrating many small-molecule databases1SAR and QSAR in Environmental Research, 2008
- The Key to AAPM's Success: Individual Commitment to the Specialty of Pain MedicinePain Medicine, 2008
- Chemical Markup, XML, and the Worldwide Web. 1. Basic PrinciplesJournal of Chemical Information and Computer Sciences, 1999
- Efficient Binary Image Thinning Using Neighborhood MapsPublished by Elsevier ,1994
- Chemical literature data extraction: The CLiDE ProjectJournal of Chemical Information and Computer Sciences, 1993
- Kekule: OCR-optical chemical (structure) recognitionJournal of Chemical Information and Computer Sciences, 1992
- Computational perception and recognition of digitized molecular structuresJournal of Chemical Information and Computer Sciences, 1990