Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning
- 1 August 2005
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 17 (9) , 1263-1273
- https://doi.org/10.1109/tkde.2005.147
Abstract
Multiclass classification has been investigated for many years in the literature. Recently, the scales of real-world multiclass classification applications have become larger and larger. For example, there are hundreds of thousands of categories employed in the Open Directory Project (ODP) and the Yahoo! directory. In such cases, the scalability of classification methods turns out to be a major concern. To tackle this problem, hierarchical classification is proposed and widely adopted to get better trade-off between effectiveness and efficiency. Unfortunately, many data sets are not explicitly organized in hierarchical forms and, therefore, hierarchical classification cannot be used directly. In this paper, we propose a novel algorithm to automatically mine a hierarchical structure from the flat taxonomy of a data corpus as a preparation for the adoption of hierarchical classification. In particular, we first compute matrices to represent the relations among categories, documents, and terms. And, then, we cocluster the three substances at different scales through consistent bipartite spectral graph copartitioning, which is formulated as a generalized singular value decomposition problem. At last, a hierarchical taxonomy is constructed from the category clusters. Our experiments showed that the proposed algorithm could discover very reasonable taxonomy hierarchy and help improve the classification accuracy.Keywords
This publication has 14 references indexed in Scilit:
- An experimental study on large-scale web categorizationPublished by Association for Computing Machinery (ACM) ,2005
- Fuzzy co-clustering of documents and keywordsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- A hierarchical method for multi-class support vector machinesPublished by Association for Computing Machinery (ACM) ,2004
- ReCoMPublished by Association for Computing Machinery (ACM) ,2003
- A scalability analysis of classifiers in text categorizationPublished by Association for Computing Machinery (ACM) ,2003
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- Bipartite graph partitioning and data clusteringPublished by Association for Computing Machinery (ACM) ,2001
- Normalized cuts and image segmentationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- The Nature of Statistical Learning TheoryPublished by Springer Nature ,1995
- Evaluating text categorizationPublished by Association for Computational Linguistics (ACL) ,1991