Techniques for the measurement of clustering tendency in document retrieval systems
- 1 December 1987
- journal article
- other
- Published by SAGE Publications in Journal of Information Science
- Vol. 13 (6) , 361-365
- https://doi.org/10.1177/016555158701300607
Abstract
The use of automatic classification techniques has been suggested as a means of increasing the effectiveness of docu ment retrieval systems; however, the automatic generation of a classification requires a large amount of computation, and it is thus of importance to know whether this computation will result in material increases in retrieval performance. This paper describes three methods - the overlap test, the nearest neighbour test and the density test - which can be used to measure the degree of clustering tendency in a set of docu ments. It is shown that the three tests are not in complete agreement with each other in their evaluation of the degree of clustering tendency present in seven document test collections. A comparison of the predicted degree of clustering tendency with the relative effectiveness of cluster and non-cluster searches suggests that the density test gives the most useful results; it also has the advantage that it does not require query and relevance data and can thus be used in a predictive manner when a document collection is to be processed for the first time.Keywords
This publication has 9 references indexed in Scilit:
- An investigation of document partitionsInformation Processing & Management, 1986
- Criteria for the selection of search strategies in best-match document-retrieval systemsInternational Journal of Man-Machine Studies, 1986
- Using interdocument similarity information in document retrieval systemsJournal of the American Society for Information Science, 1986
- Two partitioning type clustering algorithmsJournal of the American Society for Information Science, 1984
- A Survey of Recent Advances in Hierarchical Clustering AlgorithmsThe Computer Journal, 1983
- A model of cluster searching based on classificationInformation Systems, 1980
- Indexing exhaustivity and the computation of similarity matricesJournal of the American Society for Information Science, 1980
- A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONSJournal of Documentation, 1973
- The use of hierarchic clustering in information retrievalInformation Storage and Retrieval, 1971