Ranking the web frontier
- 17 May 2004
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 309-318
- https://doi.org/10.1145/988672.988714
Abstract
The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze features of the rapidly growing "frontier" of the web, namely the part of the web that crawlers are unable to cover for one reason or another. We analyze the effect of these pages and find it to be significant. We suggest ways to improve the quality of ranking by modeling the growing presence of "link rot" on the web as more sites and pages fall out of maintenance. Finally we suggest new methods of ranking that are motivated by the hierarchical structure of the web, are more efficient than PageRank, and may be more resistant to direct manipulation.Keywords
This publication has 16 references indexed in Scilit:
- Untangling compound documents on the webPublished by Association for Computing Machinery (ACM) ,2003
- Analysis of anchor text for web searchPublished by Association for Computing Machinery (ACM) ,2003
- Extrapolation methods for accelerating PageRank computationsPublished by Association for Computing Machinery (ACM) ,2003
- Searching the workplace webPublished by Association for Computing Machinery (ACM) ,2003
- “Link rot” limits the usefulness of web‐based educational materials in biochemistry and molecular biology*Biochemistry and Molecular Biology Education, 2003
- The decay and failures of web referencesCommunications of the ACM, 2003
- I/O-efficient techniques for computing pagerankPublished by Association for Computing Machinery (ACM) ,2002
- PageRank, HITS and a unified framework for link analysisPublished by Association for Computing Machinery (ACM) ,2002
- Topic-sensitive PageRankPublished by Association for Computing Machinery (ACM) ,2002
- SALSAACM Transactions on Information Systems, 2001