INCLUSION OF RELEVANCE INFORMATION IN THE TERM DISCRIMINATION MODEL
- 1 February 1989
- journal article
- review article
- Published by Emerald Publishing in Journal of Documentation
- Vol. 45 (2) , 85-109
- https://doi.org/10.1108/eb026840
Abstract
The term discrimination value of an index term has been proposed as a quantitative measure of the extent to which that term can discriminate between documents in bibliographic databases. Previous work has suggested that the most discriminating terms are those with medium frequencies of occurrence. This paper discusses the effect of including relevance data on the calculation of term discrimination values. Two algorithms are described that calculate the ability of index terms to discriminate between relevant documents, between non‐relevant documents or between relevant and non‐relevant documents. The application of these algorithms to several standard document test collections demonstrates that the exact form of the relationship between term frequency and term discrimination depends upon the particular type of discrimination which is being measured; in particular, medium frequency terms are not necessarily the best discriminators when relevance data is available. These results are compared with the discriminatory ability of terms as measured by their relevance weights, where the most discriminating terms are those with low frequencies of occurrence.Keywords
This publication has 19 references indexed in Scilit:
- Relevance feedback in a public access catalogue for a research library: Muscat at the Scott Polar Research InstituteProgram: electronic library and information systems, 1988
- Techniques for the measurement of clustering tendency in document retrieval systemsJournal of Information Science, 1987
- ON RELEVANCE WEIGHT ESTIMATION AND QUERY EXPANSIONJournal of Documentation, 1986
- The Retrieval Effects of Query Expansion on a Feedback Document Retrieval SystemThe Computer Journal, 1983
- USING PROBABILISTIC MODELS OF DOCUMENT RETRIEVAL WITHOUT RELEVANCE INFORMATIONJournal of Documentation, 1979
- Automatic aids to profile constructionJournal of the American Society for Information Science, 1976
- Relevance weighting of search termsJournal of the American Society for Information Science, 1976
- A vector space model for automatic indexingCommunications of the ACM, 1975
- A theory of term importance in automatic text analysisJournal of the American Society for Information Science, 1975
- A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONSJournal of Documentation, 1973