Gray-scale character recognition using boundary features

Abstract
Optical character recognition (OCR) traditionally applies to binary-valued imagery although text always scanned and stored in gray-scale. Binarization of multivalued image may remove important topological information from characters and introduce noise to character background. Low quality imagery, produced by poor print text and improper image lift, magnifies the shortcomings of this process. A character classifier is proposed to recognize gray-scale characters by extracting structural features from character outlines. A fast local contrast based gray-scale edge detector has been developed to locate character boundaries. A pixel is considered as an edge-pixel if its gray value is below a threshold and has a neighbor whose gray value is above the threshold. Edges are then thinned to one pixel wide. Extracting structural features from edges is performed by convolving the edges with a set of feature templates. Currently, 16 features, such as strokes, curves, and corners, are considered. Extracted features are compressed to form a binary vector with 576 features and it is used as input to a classifier. This approach is being tested on machine-printed characters which are extracted from mail address blocks. Characters are sampled at 300 ppi and quantized with 8 bits. Experimental results also demonstrate that recognition rates can be improved by enhancing image quality prior to boundary detection.

This publication has 0 references indexed in Scilit: