Localizing and segmenting text in images and videos

Top Cited Papers

7 August 2002

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Circuits and Systems for Video Technology

Vol. 12 (4) , 256-268
https://doi.org/10.1109/76.999203

Abstract

Many images, especially those used for page design on Web pages, as well as videos contain visible text. If these text occurrences could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieval. We propose a novel method for localizing and segmenting text in complex images and videos. Text lines are identified by using a complex-valued multilayer feed-forward network trained to detect text at a fixed scale and position. The network's output at all scales and positions is integrated into a single text-saliency map, serving as a starting point for candidate text lines. In the case of video, these candidate text lines are refined by exploiting the temporal redundancy of text in video. Localized text lines are then scaled to a fixed height of 100 pixels and segmented into a binary image with black characters on white background. For videos, temporal redundancy is exploited to improve segmentation performance. Input images and videos can be of any size due to a true multiresolution approach. Moreover, the system is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video, so that one text bitmap is created for all instances of that text line. Therefore, our text segmentation results can also be used for object-based video encoding such as that enabled by MPEG-4.

Keywords

This publication has 21 references indexed in Scilit:

Automatic text detection and tracking in digital video
IEEE Transactions on Image Processing, 2000
Locating and Recognizing Text in WWW Images
Information Retrieval Journal, 2000
Video OCR: indexing digital news libraries by recognition of superimposed captions
Multimedia Systems, 1999
Textfinder: an automatic system to detect and recognize text in images
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
A model of saliency-based visual attention for rapid scene analysis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1998
Neural network-based face detection
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1998
Finding text in images
Published by Association for Computing Machinery (ACM) ,1997
Abstracting Digital Movies Automatically
Journal of Visual Communication and Image Representation, 1996
Locating text in complex color images
Pattern Recognition, 1995
Historical review of OCR research and development
Proceedings of the IEEE, 1992