Block-level link analysis
- 25 July 2004
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 440-447
- https://doi.org/10.1145/1008992.1009068
Abstract
Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.Keywords
This publication has 15 references indexed in Scilit:
- Learning block importance models for web pagesPublished by Association for Computing Machinery (ACM) ,2004
- Improving pseudo-relevance feedback in web information retrieval using web page segmentationPublished by Association for Computing Machinery (ACM) ,2003
- Entropy-based link analysis for mining web informative structuresPublished by Association for Computing Machinery (ACM) ,2002
- Self-organization and identification of Web communitiesComputer, 2002
- Topic-sensitive PageRankPublished by Association for Computing Machinery (ACM) ,2002
- Integrating the document object model with hyperlinks for enhanced topic distillation and information extractionPublished by Association for Computing Machinery (ACM) ,2001
- Does “authority” mean quality? predicting expert quality ratings of Web documentsPublished by Association for Computing Machinery (ACM) ,2000
- Authoritative sources in a hyperlinked environmentJournal of the ACM, 1999
- Inferring Web communities from link topologyPublished by Association for Computing Machinery (ACM) ,1998
- Overview of the Okapi projectsJournal of Documentation, 1997