Word spotting in scanned images using hidden Markov models

Abstract
A hidden-Markov-model (HMM)-based system for font-independent spotting of user-specified keywords in a scanned image is described. Word bounding boxes of potential keywords are extracted from the image using a morphology-based preprocessor. Feature vectors based on the external shape and internal structure of the word are computed over vertical columns of pixels in a word bounding box. For each user-specified keyword, an HMM is created by concatenating appropriate context-dependent character HMMs. Nonkeywords are modeled using an HMM based on context-dependent subcharacter models. Keyword spotting is performed using a Viterbi search through the HMM network created by connecting the keyword and nonkeyword HMMs in parallel. Applications of word-image spotting include information filtering in images from facsimile and copy machines, and information retrieval from text image databases.

This publication has 6 references indexed in Scilit: