Hidden tree markov models for document image classification
- 2 April 2003
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 25 (4) , 520-524
- https://doi.org/10.1109/tpami.2003.1190578
Abstract
Classification is an important problem in image document processing and is often a preliminary step toward recognition, understanding, and information extraction. In this paper, the problem is formulated in the framework of concept learning and each category corresponds to the set of image documents with similar physical structure. We propose a solution based on two algorithmic ideas. First, we obtain a structured representation of images based on labeled XY-trees (this representation informs the learner about important relationships between image subconstituents). Second, we propose a probabilistic architecture that extends hidden Markov models for learning probability distributions defined on spaces of labeled trees. Finally, a successful application of this method to the categorization of commercial invoices is presented.Keywords
This publication has 10 references indexed in Scilit:
- Initial learning of document structurePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Modeling documents for structure recognition using generalized N-gramsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Clustering and classification of document structure-a machine learning approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Automatic document classification and indexing in high-volume applicationsInternational Journal on Document Analysis and Recognition (IJDAR), 2001
- Classification of document pages using structure-based featuresInternational Journal on Document Analysis and Recognition (IJDAR), 2001
- Structured document segmentation and representation by the modified X-Y treePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- A general framework for adaptive processing of data structuresIEEE Transactions on Neural Networks, 1998
- Probabilistic Independence Networks for Hidden Markov Probability ModelsNeural Computation, 1997
- Bayesian Networks for Data MiningData Mining and Knowledge Discovery, 1997
- Bayesian Belief Networks as a tool for stochastic parsingSpeech Communication, 1995