Querying the World Wide Web
- 24 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The World Wide Web is a large, heterogeneous, distributed collection of documents connected by hypertext links. The most common technology currently used for searching the Web depends on sending information retrieval requests to "index servers". One problem with this is that these queries cannot exploit the structure and topology of the document network. The authors propose a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries. They give a formal semantics for WebSQL using a calculus based on a novel "virtual graph" model of a document network. They propose a new theory of query cost based on the idea of "query locality," that is, how much of the network must be visited to answer a particular query. Finally, they describe a prototype implementation of WebSQL written in Java.Keywords
This publication has 10 references indexed in Scilit:
- A declarative language for querying and restructuring the WebPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Information gathering in the World-Wide WebACM Transactions on Database Systems, 1998
- Finding Regular Simple Paths in Graph DatabasesSIAM Journal on Computing, 1995
- Querying semistructured heterogeneous informationPublished by Springer Nature ,1995
- The World-Wide WebCommunications of the ACM, 1994
- From structured documents to novel query facilitiesPublished by Association for Computing Machinery (ACM) ,1994
- Queries on structures in hypertextPublished by Springer Nature ,1993
- An algebra for structured office documentsACM Transactions on Information Systems, 1989
- Expressing structural hypertext queries in graphlogPublished by Association for Computing Machinery (ACM) ,1989
- Reflections on NoteCards: seven issues for the next generation of hypermedia systemsCommunications of the ACM, 1988