Language identification for printed text independent of segmentation
- 19 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 3, 428-431 vol.3
- https://doi.org/10.1109/icip.1995.537663
Abstract
This paper presents efficient algorithms for determining the language classification of machine generated documents without requiring the identification of individual characters. Such algorithms may be useful for sorting and routing of facsimile documents as they arrive so that appropriate routing and secondary analysis, which may include OCR, is selected for each document. It may also prove useful as a component of a content addressable document access system. There have been numerous reported efforts which attempt to segment printed documents into homogeneous regions using Hough transforms, hidden Markov models, morphological filtering, and neural networks. However, language identification can be accomplished without explicit segmentation using the less computationally intensive methods described.Keywords
This publication has 5 references indexed in Scilit:
- Estimation Of Nerve Fiber Loss From Digitized Retinal ImagesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Multilevel segmentation and analysis of facsimile images for document classificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Extraction of text layout structures on document images based on statistical characterizationPublished by SPIE-Intl Soc Optical Eng ,1995
- Document image decoding using Markov source modelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1993
- Finding similar patterns in large image databasesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1993