Web page classification
Top Cited Papers
- 23 February 2009
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Computing Surveys
- Vol. 41 (2) , 1-31
- https://doi.org/10.1145/1459352.1459357
Abstract
Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining Web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As we review work in Web page classification, we note the importance of these Web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages.Keywords
Funding Information
- Division of Information and Intelligent Systems (IIS-0328825)
This publication has 98 references indexed in Scilit:
- Reinforcing Web-object Categorization Through InterrelationshipsData Mining and Knowledge Discovery, 2006
- Mapping the Semantics of Web Text and LinksIEEE Internet Computing, 2005
- Using a web-based categorization approach to generate thematic metadata from textsACM Transactions on Asian Language Information Processing, 2004
- Extracting fuzzy classification rules from partially labeled dataSoft Computing, 2004
- The potential of the metasearch engineProceedings of the American Society for Information Science and Technology, 2004
- Machine learning in automated text categorizationACM Computing Surveys, 2002
- Query clustering using user logsACM Transactions on Information Systems, 2002
- Web mining researchACM SIGKDD Explorations Newsletter, 2000
- Text-learning and related intelligent agents: a surveyIEEE Intelligent Systems and their Applications, 1999
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990