Segmenting handwritten text lines into words using distance algorithms
- 1 August 1992
- proceedings article
- Published by SPIE-Intl Soc Optical Eng
Abstract
This paper explores different distance algorithms that can group connected components of a handwritten text line into words. A binarized handwritten text image normally consists of many connected components, where each component is a character fragment, an isolated character, or a group of characters. When the writing style is unconstrained, recognition of individual components is unreliable so the components must be grouped into words before recognition algorithms (which may require dictionaries) can be used. Algorithms that compute the distance between connected components can indicate how the connected components should be clustered into words. We show that fast straightforward distance algorithms (such as using the horizontal distance between the component''s bounding boxes) have mediocre performance. Euclidean distance algorithms perform well but are computationally slow. This paper describes original methods of computing distances. These algorithms include combining a set of horizontal distances between components (applied to each pixel row) with the Euclidean and bounding box methods to achieve high performance and reasonable speed. We examine six distance algorithms and each is tested on unconstrained handwritten address images.Keywords
This publication has 0 references indexed in Scilit: