Minimal probing
- 3 June 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 346-357
- https://doi.org/10.1145/564691.564731
Abstract
This paper addresses the problem of evaluating ranked top-k queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top-k answers. To minimize expensive probes, we thus develop the formal principle of "necessary probes," which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.Keywords
This publication has 12 references indexed in Scilit:
- Evaluating top-k queries over Web-accessible databasesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- PREFERPublished by Association for Computing Machinery (ACM) ,2001
- Optimal aggregation algorithms for middlewarePublished by Association for Computing Machinery (ACM) ,2001
- A framework for expressing and combining preferencesPublished by Association for Computing Machinery (ACM) ,2000
- Integration of heterogeneous databases without common domains using queries based on textual similarityPublished by Association for Computing Machinery (ACM) ,1998
- On saying “Enough already!” in SQLPublished by Association for Computing Machinery (ACM) ,1997
- Combining fuzzy information from multiple systems (extended abstract)Published by Association for Computing Machinery (ACM) ,1996
- Optimizing queries over multimedia repositoriesPublished by Association for Computing Machinery (ACM) ,1996
- Why decision support fails and how to fix itACM SIGMOD Record, 1995
- Fuzzy setsInformation and Control, 1965