Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading
- 1 November 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 24 (11) , 1425-1437
- https://doi.org/10.1109/tpami.2002.1046151
Abstract
This paper describes a handwritten character string recognition system for Japanese mail address reading on a very large vocabulary. The address phrases are recognized as a whole because there is no extra space between words. The lexicon contains 111,349 address phrases, which are stored in a trie structure. In recognition, the text line image is matched with the lexicon entries (phrases) to obtain reliable segmentation and retrieve valid address phrases. The paper first introduces some effective techniques for text line image preprocessing and presegmentation. In presegmentation, the text line image is separated into primitive segments by connected component analysis and touching pattern splitting based on contour shape analysis. In lexicon matching, consecutive segments are dynamically combined into candidate character patterns. An accurate character classifier is embedded in lexicon matching to select characters matched with a candidate pattern from a dynamic category set. A beam search strategy is used to control the lexicon matching so as to achieve real-time recognition. In experiments on 3,589 live mail images, the proposed method achieved correct rate of 83.68 percent while the error rate is less than 1 percent.Keywords
This publication has 38 references indexed in Scilit:
- Online recognition of free-format Japanese handwritingsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Segmentation of handwritten digits using contour featuresPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Precise candidate selection for large character set recognition by confidence evaluationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Postprocessing statistical language models for handwritten Chinese character recognizerIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999
- A survey of methods and strategies in character segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996
- Character segmentation in handwritten words — An overviewPattern Recognition, 1996
- Discriminative learning for minimum error classification (pattern recognition)IEEE Transactions on Signal Processing, 1992
- Segmentation methods for character recognition: from segmentation to document structure analysisProceedings of the IEEE, 1992
- Trie memoryCommunications of the ACM, 1960
- A Stochastic Approximation MethodThe Annals of Mathematical Statistics, 1951