Web classification using support vector machine
Top Cited Papers
Open Access
- 8 November 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
Abstract
In web classification, web pages from one or more web sites are assigned to pre-defined categories according to their content. Since web pages are more than just plain text documents, web classification methods have to consider using other context features of web pages, such as hyperlinks and HTML tags. In this paper, we propose the use of Support Vector Machine (SVM) classifiers to classify web pages using both their text and context feature sets. We have experimented our web classification method on the WebKB data set. Compared with earlier Foil-Pilfs method on the same data set, our method has been shown to perform very well. We have also shown that the use of context features especially hyperlinks can improve the classification performance significantly.Keywords
This publication has 10 references indexed in Scilit:
- Using web structure for classifying and describing web pagesPublished by Association for Computing Machinery (ACM) ,2002
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- A Study of Approaches to Hypertext CategorizationJournal of Intelligent Information Systems, 2002
- A study of thresholding strategies for text categorizationPublished by Association for Computing Machinery (ACM) ,2001
- Relational Learning with Statistical Predicate Invention: Better Models for HypertextMachine Learning, 2001
- A practical hypertext catergorization method using links and incrementally available class informationPublished by Association for Computing Machinery (ACM) ,2000
- Hierarchical classification of Web contentPublished by Association for Computing Machinery (ACM) ,2000
- An Evaluation of Statistical Approaches to Text CategorizationInformation Retrieval Journal, 1999
- Inductive learning algorithms and representations for text categorizationPublished by Association for Computing Machinery (ACM) ,1998
- Enhanced hypertext categorization using hyperlinksPublished by Association for Computing Machinery (ACM) ,1998