Initial learning of document structure
- 30 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Proposes an approach for automatically generating a decision tree which is applied as a model for the logical labeling of business letters. Instead of top-down determination of the discriminating attributes, the system inspects a finite set of document instances that are presented to a learner in a bottom-up position. The learner itself figures out local similarities, rates them with respect to the overall structure, and determines the best structural match of two instances (neighborhood). The entire decision tree is grown step by step deducing subtrees by forming generalizations from a neighborhood. Consequently, heuristics are learned for structurally discriminating documents during subsequent classification.Keywords
This publication has 5 references indexed in Scilit:
- From paper to office document standard representationComputer, 1992
- ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed DocumentsPublished by Springer Nature ,1992
- Learning logical definitions from relationsMachine Learning, 1990
- Induction of decision treesMachine Learning, 1986
- Generalization as searchArtificial Intelligence, 1982