Performance analysis of three text-join algorithms
- 1 January 1998
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 10 (3) , 477-492
- https://doi.org/10.1109/69.687979
Abstract
When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins驴joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed.Keywords
This publication has 12 references indexed in Scilit:
- Translation of object-oriented queries to relational queriesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A theory of translation from relational queries to hierarchical queriesIEEE Transactions on Knowledge and Data Engineering, 1995
- Incremental updates of inverted lists for text document retrievalPublished by Association for Computing Machinery (ACM) ,1994
- On the Consecutive-Retrieval ProblemSIAM Journal on Computing, 1994
- Automating the assignment of submitted manuscripts to reviewersPublished by Association for Computing Machinery (ACM) ,1992
- Interoperability of multiple autonomous databasesACM Computing Surveys, 1990
- A Scheme for Batch Verification of Integrity Assertions in a Database SystemIEEE Transactions on Software Engineering, 1984
- View Definition and Generalization for Database Integration in a Multidatabase SystemIEEE Transactions on Software Engineering, 1984
- Query processing in a system for distributed databases (SDD-1)ACM Transactions on Database Systems, 1981
- File organizationCommunications of the ACM, 1972