Automatic discovery of language models for text databases
- 1 June 1999
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 28 (2) , 479-490
- https://doi.org/10.1145/304181.304224
Abstract
The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GIOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations. This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.Keywords
This publication has 9 references indexed in Scilit:
- Evaluating database selection techniquesPublished by Association for Computing Machinery (ACM) ,1998
- Effective retrieval with distributed collectionsPublished by Association for Computing Machinery (ACM) ,1998
- STARTSPublished by Association for Computing Machinery (ACM) ,1997
- Learning collection fusion strategiesPublished by Association for Computing Machinery (ACM) ,1995
- Searching distributed collections with inference networksPublished by Association for Computing Machinery (ACM) ,1995
- The effectiveness of GIOSS for the text database discovery problemPublished by Association for Computing Machinery (ACM) ,1994
- Distributed indexingPublished by Association for Computing Machinery (ACM) ,1991
- An experimental comparison of the effectiveness of computers and humans as search intermediariesJournal of the American Society for Information Science, 1983
- The Automatic Creation of Literature AbstractsIBM Journal of Research and Development, 1958