Growing and navigating the small world Web by local content
Open Access
- 14 October 2002
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 99 (22) , 14014-14019
- https://doi.org/10.1073/pnas.212348399
Abstract
Can we model the scale-free distribution of Web hypertext degree under realistic assumptions about the behavior of page authors? Can a Web crawler efficiently locate an unknown relevant page? These questions are receiving much attention due to their potential impact for understanding the structure of the Web and for building better search engines. Here I investigate the connection between the linkage and content topology of Web pages. The relationship between a text-induced distance metric and a link-based neighborhood probability distribution displays a phase transition between a region where linkage is not determined by content and one where linkage decays according to a power law. This relationship is used to propose a Web growth model that is shown to accurately predict the distribution of Web page degree, based on textual content and assuming only local knowledge of degree for existing pages. A qualitatively similar phase transition is found between linkage and semantic distance, with an exponential decay tail. Both relationships suggest that efficient paths can be discovered by decentralized Web navigation algorithms based on textual and/or categorical cues.Keywords
This publication has 27 references indexed in Scilit:
- Self-organization and identification of Web communitiesComputer, 2002
- Identity and Search in Social NetworksScience, 2002
- Path finding strategies in scale-free networksPhysical Review E, 2002
- The Structure of the WebScience, 2001
- Search in power-law networksPhysical Review E, 2001
- Bose-Einstein Condensation in Complex NetworksPhysical Review Letters, 2001
- Structure of Growing Networks with Preferential LinkingPhysical Review Letters, 2000
- Power-Law Distribution of the World Wide WebScience, 2000
- Mean-field theory for scale-free random networksPhysica A: Statistical Mechanics and its Applications, 1999
- Efficient crawling through URL orderingComputer Networks and ISDN Systems, 1998