Hidden tree markov models for document image classification

2 April 2003

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 25 (4) , 520-524
https://doi.org/10.1109/tpami.2003.1190578

Abstract

Classification is an important problem in image document processing and is often a preliminary step toward recognition, understanding, and information extraction. In this paper, the problem is formulated in the framework of concept learning and each category corresponds to the set of image documents with similar physical structure. We propose a solution based on two algorithmic ideas. First, we obtain a structured representation of images based on labeled XY-trees (this representation informs the learner about important relationships between image subconstituents). Second, we propose a probabilistic architecture that extends hidden Markov models for learning probability distributions defined on spaces of labeled trees. Finally, a successful application of this method to the categorization of commercial invoices is presented.

Keywords

This publication has 10 references indexed in Scilit:

Initial learning of document structure
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Modeling documents for structure recognition using generalized N-grams
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Clustering and classification of document structure-a machine learning approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Automatic document classification and indexing in high-volume applications
International Journal on Document Analysis and Recognition (IJDAR), 2001
Classification of document pages using structure-based features
International Journal on Document Analysis and Recognition (IJDAR), 2001
Structured document segmentation and representation by the modified X-Y tree
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999
A general framework for adaptive processing of data structures
IEEE Transactions on Neural Networks, 1998
Probabilistic Independence Networks for Hidden Markov Probability Models
Neural Computation, 1997
Bayesian Networks for Data Mining
Data Mining and Knowledge Discovery, 1997
Bayesian Belief Networks as a tool for stochastic parsing
Speech Communication, 1995