Co-clustering documents and words using bipartite spectral graph partitioning
Top Cited Papers
- 26 August 2001
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 269-274
- https://doi.org/10.1145/502512.502550
Abstract
Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.Keywords
This publication has 14 references indexed in Scilit:
- Normalized cuts and image segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Distributional clustering of words for text classificationPublished by Association for Computing Machinery (ACM) ,1998
- Projections for efficient document clusteringPublished by Association for Computing Machinery (ACM) ,1997
- Self-Organizing MapsPublished by Springer Nature ,1995
- Scatter/Gather: a cluster-based approach to browsing large document collectionsPublished by Association for Computing Machinery (ACM) ,1992
- New spectral methods for ratio cut partitioning and clusteringIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992
- Partitioning Sparse Matrices with Eigenvectors of GraphsSIAM Journal on Matrix Analysis and Applications, 1990
- Lower Bounds for the Partitioning of GraphsIBM Journal of Research and Development, 1973
- An r-Dimensional Quadratic Placement AlgorithmManagement Science, 1970
- An Efficient Heuristic Procedure for Partitioning GraphsBell System Technical Journal, 1970