Cross-lingual relevance models
- 11 August 2002
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 175-182
- https://doi.org/10.1145/564376.564408
Abstract
We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the source language. The model integrates popular techniques of disambiguation and query expansion in a unified formal framework. We describe how the topic model can be estimated with either a parallel corpus or a dictionary. We test the framework by constructing Chinese topic models from English queries and using them in the CLIR task of TREC9. The model achieves performance around 95% of the strong mono-lingual baseline in terms of average precision. In initial precision, our model outperforms the mono-lingual baseline by 20%. The main contribution of this work is the unified formal model which integrates techniques that are essential for effective Cross-Language Retrieval.Keywords
This publication has 8 references indexed in Scilit:
- The eighth text REtrieval conference (TREC-8)Published by National Institute of Standards and Technology (NIST) ,2000
- Disambiguation Strategies for Cross-Language Information RetrievalPublished by Springer Nature ,1999
- Information retrieval as statistical translationPublished by Association for Computing Machinery (ACM) ,1999
- Resolving ambiguity for cross-language retrievalPublished by Association for Computing Machinery (ACM) ,1998
- Phrasal translation and query expansion techniques for cross-language information retrievalPublished by Association for Computing Machinery (ACM) ,1997
- Experiments in multilingual information retrieval using the SPIDER systemPublished by Association for Computing Machinery (ACM) ,1996
- An algorithm for suffix strippingProgram: electronic library and information systems, 1980
- THE PROBABILITY RANKING PRINCIPLE IN IRJournal of Documentation, 1977