Document image understanding: geometric and logical layout
- 1 January 1994
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636919,p. 385-390
- https://doi.org/10.1109/cvpr.1994.323855
Abstract
Document image understanding encompasses the technology required to make paper documents equivalent to other computer exchange media like floppies, tapes, and CDROMs. The physical reader of the paper document is the scanner just like the physical reader of the floppy is the floppy drive and the physical reader of the tape cartridge is the tape cartridge drive, and the physical reader of the CDROM is the CDROM drive. In the survey presented, we restrict ourselves to documents such as business letters, forms, and scientific and technical articles such as those found in archival journals and technical conferences. Understanding such documents involves estimating the rotation skew of each document page, determining the geometric page layout, labeling blocks as text or non-text, determining the read order for text blocks, recognizing the text of text blocks through an OCR system, determining the logical page layout, and formatting the data and information of the document in a suitable way for use by a word processing system or by an information retrieval system.Keywords
This publication has 22 references indexed in Scilit:
- Page segmentation without rectangle assumptionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Nested segmentation: an approach for layout analysis in document classificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Document image segmentation and text area orderingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The implementation methodology for a CD-ROM English document databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Document architecture language (DAL) approach to document processingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Understanding multi-articled documentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A rule-based system for document image segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A Top-Down Approach to the Analysis of Document ImagesPublished by Springer Nature ,1992
- An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1990
- Block segmentation and text extraction in mixed text/image documentsComputer Graphics and Image Processing, 1982