Character extraction from noisy background for an automatic reference system

Abstract
It is important to provide digitized manuscripts of old literature (in page image form) and their electronic text (in full-text form), with an automatic reference mechanism between the images and the text, on the Internet. As an essential step for creating such an automatic reference system, this paper describes the issue of extracting character areas from page images of old handwritten manuscripts. Page images of old manuscripts are usually terribly dirty and considerable large in size. To overcome the first problem, we propose a new effective method for separating characters from noisy background, since conventional threshold selection techniques are inadequate to cope with the image where the gray levels of the character parts are overlapped by that of the background. To solve the second problem, we propose an approach based on a downscaled image and a recursive labeling method for word extraction. This approach is suitable for large size images because it has the advantage of saving memory and reducing processing time.

This publication has 3 references indexed in Scilit: