Novelty and redundancy detection in adaptive filtering
Top Cited Papers
- 11 August 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
This paper addresses the problem of extending an adaptive information filtering system to make decisions about the novelty and redundancy of relevant documents. It argues that relevance and redundance should each be modelled explicitly and separately. A set of five redundancy measures are proposed and evaluated in experiments with and without redundancy thresholds. The experimental results demonstrate that the cosine similarity metric and a redundancy measure based on a mixture of language models are both effective for identifying redundant documents.Keywords
This publication has 8 references indexed in Scilit:
- Model-based feedback in the language modeling approach to information retrievalPublished by Association for Computing Machinery (ACM) ,2001
- Combining semantic and syntactic document classifiers to improve first story detectionPublished by Association for Computing Machinery (ACM) ,2001
- A study of smoothing methods for language models applied to Ad Hoc information retrievalPublished by Association for Computing Machinery (ACM) ,2001
- Maximum likelihood estimation for filtering thresholdsPublished by Association for Computing Machinery (ACM) ,2001
- First story detection in TDT is hardPublished by Association for Computing Machinery (ACM) ,2000
- A hidden Markov model information retrieval systemPublished by Association for Computing Machinery (ACM) ,1999
- Measures of distributional similarityPublished by Association for Computational Linguistics (ACL) ,1999
- On-line new event detection and trackingPublished by Association for Computing Machinery (ACM) ,1998