Extracting semi-structured data through examples

1 November 1999

conference paper
Published by Association for Computing Machinery (ACM)

p. 94-101
https://doi.org/10.1145/319950.319962

Abstract

In this paper, we describe an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. To perform the extraction of new objects, we introduce a bottom-up extration strategy and, through experimentation, demonstrate that it works quite effectively with distinct Web sources, even if only a few examples are provided by the user.

Keywords

This publication has 11 references indexed in Scilit:

Ontology-based extraction and structuring of information from data-rich unstructured documents
Published by Association for Computing Machinery (ACM) ,1998
NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents
Published by Association for Computing Machinery (ACM) ,1998
A Conceptual-Modeling Approach to Extracting Data from the Web
Published by Springer Nature ,1998
Wrapper generation for semi-structured Internet sources
ACM SIGMOD Record, 1997
Semistructured data
Published by Association for Computing Machinery (ACM) ,1997
Cut and paste
Published by Association for Computing Machinery (ACM) ,1997
Passage retrieval revisited
Published by Association for Computing Machinery (ACM) ,1997
Template-based wrappers in the TSIMMIS system
Published by Association for Computing Machinery (ACM) ,1997
TINTIN
Published by Association for Computing Machinery (ACM) ,1997
Information extraction
Communications of the ACM, 1996