Efficient passage ranking for document databases

1 October 1999

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems

Vol. 17 (4) , 406-439
https://doi.org/10.1145/326440.326445

Abstract

Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant material in documents that are too large for users to consider as a whole. However, ranking of passages can considerably increase retrieval costs. In this article we explore alternative query evaluation techniques, and develop new tecnhiques for evaluating queries on passages. We show experimentally that, appropriately implemented, effective passage retrieval is practical in limited memory on a desktop machine. Compared to passage ranking with adaptations of current document ranking algorithms, our new “DO-TOS” passage-ranking algorithm requires only a fraction of the resources, at the cost of a small loss of effectiveness.

Keywords

This publication has 7 references indexed in Scilit:

Indexing Techniques for Advanced Database Systems
Published by Springer Nature ,1997
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems, 1996
Filtered document retrieval with frequency‐sorted indexes
Journal of the American Society for Information Science, 1996
Overview of the Second Text Retrieval Conference (TREC-2)
Information Processing & Management, 1995
The MG retrieval system
Communications of the ACM, 1995
Document and Passage Retrieval Based on Hidden Markov Models
Published by Springer Nature ,1994
Retrieving records from a gigabyte of text on a minicomputer using statistical ranking
Journal of the American Society for Information Science, 1990