Document clustering using an inverted file approach

1 October 1980

journal article
Published by SAGE Publications in Journal of Information Science

Vol. 2 (5) , 223-231
https://doi.org/10.1177/016555158000200503

Abstract

An automatic document clustering procedure is described which does not require the use of an inter-document similar ity matrix and which is independent of the order in which the documents are processed. The procedure makes use of an initial set of clusters which is derived from certain of the terms in the indexing vocabulary used to characterise the documents in the file. The retrieval effectiveness obtained using the clustered file is compared with that obtained from serial searching and from use of the single-linkage clustering method.

Keywords

This publication has 18 references indexed in Scilit:

Indexing exhaustivity and the computation of similarity matrices
Journal of the American Society for Information Science, 1980
Unresolved Problems in Cluster Analysis
Published by JSTOR ,1979
Clustering large files of documents using the single‐link method
Journal of the American Society for Information Science, 1977
An efficient algorithm for a complete link method
The Computer Journal, 1977
Document clustering: An evaluation of some experiments with the cranfield 1400 collection
Information Processing & Management, 1975
A file organization and maintenance procedure for dynamic document collections
Information Processing & Management, 1975
The effect of document ordering in rocchio's clustering algorithm
Journal of the American Society for Information Science, 1973
SLINK: An optimally efficient algorithm for the single-link cluster method
The Computer Journal, 1973
The use of hierarchic clustering in information retrieval
Information Storage and Retrieval, 1971
Controversy concerning the criteria for taxonometric strategies
The Computer Journal, 1971