Initial learning of document structure

30 December 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 86-90
https://doi.org/10.1109/icdar.1993.395776

Abstract

Proposes an approach for automatically generating a decision tree which is applied as a model for the logical labeling of business letters. Instead of top-down determination of the discriminating attributes, the system inspects a finite set of document instances that are presented to a learner in a bottom-up position. The learner itself figures out local similarities, rates them with respect to the overall structure, and determines the best structural match of two instances (neighborhood). The entire decision tree is grown step by step deducing subtrees by forming generalizations from a neighborhood. Consequently, heuristics are learned for structurally discriminating documents during subsequent classification.

Keywords

This publication has 5 references indexed in Scilit:

From paper to office document standard representation
Computer, 1992
ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents
Published by Springer Nature ,1992
Learning logical definitions from relations
Machine Learning, 1990
Induction of decision trees
Machine Learning, 1986
Generalization as search
Artificial Intelligence, 1982