Clustering and classification of document structure-a machine learning approach
- 19 November 2002
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 2, 587-591
- https://doi.org/10.1109/icdar.1995.601965
Abstract
We describe a system which is capable of learning the presentation of document logical structures, exemplarily shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a source for classifying future input. The paper introduces the different learning steps, describes how the resulting concept hierarchy is applied for logical labeling and reports on the results.Keywords
This publication has 6 references indexed in Scilit:
- Initial learning of document structurePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- MULTISTRATEGY LEARNING FOR DOCUMENT RECOGNITIONApplied Artificial Intelligence, 1994
- From paper to office document standard representationComputer, 1992
- ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed DocumentsPublished by Springer Nature ,1992
- Knowledge acquisition via incremental conceptual clusteringMachine Learning, 1987
- Induction of decision treesMachine Learning, 1986