Extracting semi-structured data through examples
- 1 November 1999
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
In this paper, we describe an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. To perform the extraction of new objects, we introduce a bottom-up extration strategy and, through experimentation, demonstrate that it works quite effectively with distinct Web sources, even if only a few examples are provided by the user.Keywords
This publication has 11 references indexed in Scilit:
- Ontology-based extraction and structuring of information from data-rich unstructured documentsPublished by Association for Computing Machinery (ACM) ,1998
- NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documentsPublished by Association for Computing Machinery (ACM) ,1998
- A Conceptual-Modeling Approach to Extracting Data from the WebPublished by Springer Nature ,1998
- Wrapper generation for semi-structured Internet sourcesACM SIGMOD Record, 1997
- Semistructured dataPublished by Association for Computing Machinery (ACM) ,1997
- Cut and pastePublished by Association for Computing Machinery (ACM) ,1997
- Passage retrieval revisitedPublished by Association for Computing Machinery (ACM) ,1997
- Template-based wrappers in the TSIMMIS systemPublished by Association for Computing Machinery (ACM) ,1997
- TINTINPublished by Association for Computing Machinery (ACM) ,1997
- Information extractionCommunications of the ACM, 1996