Major components of a complete text reading system
- 1 July 1992
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE
- Vol. 80 (7) , 1133-1149
- https://doi.org/10.1109/5.156475
Abstract
The document image processes used in a recently developed text reading system are described. The system consists of three major components: document analysis, document understanding, and character segmentation/recognition. The document analysis component extracts lines of text from a page for recognition. The document understanding component extracts logical relationships between the document constituents. The character segmentation/recognition component extracts characters from a text line and recognizes them. Experiments on more than a hundred documents have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and that the proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other.<>Keywords
This publication has 12 references indexed in Scilit:
- Document Image Analysis and RecognitionPublished by World Scientific Pub Co Pte Ltd ,1992
- Anatomy of a versatile page readerProceedings of the IEEE, 1992
- An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical ExpressionsPublished by Springer Nature ,1992
- Reading chessPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1990
- An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1990
- On the Recognition of Printed Characters of Any Font and SizePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1987
- MACSYM: A hierarchical parallel image processing system for event-driven pattern understanding of documentsPattern Recognition, 1984
- Block segmentation and text extraction in mixed text/image documentsComputer Graphics and Image Processing, 1982
- An On-Line Minicomputer-Based System for Reading Printed Text AloudIEEE Transactions on Systems, Man, and Cybernetics, 1978
- A Theory of Character Recognition by Pattern Matching MethodPublished by Springer Nature ,1974