Character extraction from noisy background for an automatic reference system

1 January 1999

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 143-146
https://doi.org/10.1109/icdar.1999.791745

Abstract

It is important to provide digitized manuscripts of old literature (in page image form) and their electronic text (in full-text form), with an automatic reference mechanism between the images and the text, on the Internet. As an essential step for creating such an automatic reference system, this paper describes the issue of extracting character areas from page images of old handwritten manuscripts. Page images of old manuscripts are usually terribly dirty and considerable large in size. To overcome the first problem, we propose a new effective method for separating characters from noisy background, since conventional threshold selection techniques are inadequate to cope with the image where the gray levels of the character parts are overlapped by that of the background. To solve the second problem, we propose an approach based on a downscaled image and a recursive labeling method for word extraction. This approach is suitable for large size images because it has the advantage of saving memory and reducing processing time.

Keywords

This publication has 3 references indexed in Scilit:

Structure recognition methods for various types of documents
Machine Vision and Applications, 1993
Contour Filling
Published by Springer Nature ,1982
Computer Detection of Freehand Forgeries
IEEE Transactions on Computers, 1977