Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases
- 1 December 1990
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems
- Vol. 15 (4) , 483-517
- https://doi.org/10.1145/99935.99938
Abstract
A new algorithm for document clustering is introduced. The base concept of the algorithm, the cover coefficient (CC) concept, provides a means of estimating the number of clusters within a document database and related indexing and clustering analytically. The CC concept is used also to identify the cluster seeds and to form clusters with these seeds. It is shown that the complexity of the clustering process is very low. The retrieval experiments show that the information-retrieval effectiveness of the algorithm is compatible with a very demanding complete linkage clustering method that is known to have good retrieval performance. The experiments also show that the algorithm is 15.1 to 63.5 (with an average of 47.5) percent better than four other clustering algorithms in cluster-based information retrieval. The experiments have validated the indexing-clustering relationships and the complexity of the algorithm and have shown improvements in retrieval effectiveness. In the experiments two document databases are used: TODS214 and INSPEC. The latter is a common database with 12,684 documents.Keywords
This publication has 17 references indexed in Scilit:
- Dynamic cluster maintenanceInformation Processing & Management, 1989
- Comparison of Hierarchic Agglomerative Clustering Methods for Document RetrievalThe Computer Journal, 1989
- Recent trends in hierarchic document clustering: A critical reviewInformation Processing & Management, 1988
- Term-weighting approaches in automatic text retrievalInformation Processing & Management, 1988
- Techniques for the measurement of clustering tendency in document retrieval systemsJournal of Information Science, 1987
- Implementing agglomerative hierarchic clustering algorithms for use in document retrievalInformation Processing & Management, 1986
- Using interdocument similarity information in document retrieval systemsJournal of the American Society for Information Science, 1986
- HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR AUTOMATIC DOCUMENT CLASSIFICATIONJournal of Documentation, 1984
- Generation and search of clustered filesACM Transactions on Database Systems, 1978
- A file organization and maintenance procedure for dynamic document collectionsInformation Processing & Management, 1975