Support vector machines classification with a very large-scale taxonomy
- 1 June 2005
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGKDD Explorations Newsletter
- Vol. 7 (1) , 36-43
- https://doi.org/10.1145/1089815.1089821
Abstract
Very large-scale classification taxonomies typically have hundreds of thousands of categories, deep hierarchies, and skewed category distribution over documents. However, it is still an open question whether the state-of-the-art technologies in automated text categorization can scale to (and perform well on) such large taxonomies. In this paper, we report the first evaluation of Support Vector Machines (SVMs) in web-page classification over the full taxonomy of the Yahoo! categories. Our accomplishments include: 1) a data analysis on the Yahoo! taxonomy; 2) the development of a scalable system for large-scale text categorization; 3) theoretical analysis and experimental evaluation of SVMs in hierarchical and non-hierarchical settings for classification; 4) an investigation of threshold tuning algorithms with respect to time complexity and their effect on the classification accuracy of SVMs. We found that, in terms of scalability, the hierarchical use of SVMs is efficient enough for very large-scale classification; however, in terms of effectiveness, the performance of SVMs over the Yahoo! Directory is still far from satisfactory, which indicates that more substantial investigation is needed.Keywords
This publication has 10 references indexed in Scilit:
- Hierarchical document categorization with support vector machinesPublished by Association for Computing Machinery (ACM) ,2004
- Applying Support Vector Machines to Imbalanced DatasetsPublished by Springer Nature ,2004
- A scalability analysis of classifiers in text categorizationPublished by Association for Computing Machinery (ACM) ,2003
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- A study of thresholding strategies for text categorizationPublished by Association for Computing Machinery (ACM) ,2001
- Hierarchical classification of Web contentPublished by Association for Computing Machinery (ACM) ,2000
- Bringing order to the WebPublished by Association for Computing Machinery (ACM) ,2000
- A re-examination of text categorization methodsPublished by Association for Computing Machinery (ACM) ,1999
- Multicategory Classification by Support Vector MachinesComputational Optimization and Applications, 1999
- OHSUMED: An Interactive Retrieval Evaluation and New Large Test Collection for ResearchPublished by Springer Nature ,1994