Jedi: extracting and synthesizing information from the Web
- 1 January 1998
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Jedi (Java based Extraction and Dissemination of Information) is a lightweight tool for the creation of wrappers and mediators to extract, combine, and reconcile information from several independent information sources. For wrappers it uses attributed grammars, which are evaluated with a fault-tolerant parsing strategy to cope with ambiguous grammars and irregular sources. For mediation it uses a simple generic object-model that can be extended with Java-libraries for specific models such as HTML, XML or the relational model. This paper describes the architecture of Jedi, and then focuses on Jedi's wrapper generator.Keywords
This publication has 6 references indexed in Scilit:
- NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documentsACM SIGMOD Record, 1998
- Error tolerant document structure analysisInternational Journal on Digital Libraries, 1998
- Wrapper generation for semi-structured Internet sourcesACM SIGMOD Record, 1997
- Cut and pastePublished by Association for Computing Machinery (ACM) ,1997
- Querying documents in object databasesInternational Journal on Digital Libraries, 1997
- Template-based wrappers in the TSIMMIS systemPublished by Association for Computing Machinery (ACM) ,1997