Curvature of co-links uncovers hidden thematic layers in the World Wide Web
Top Cited Papers
Open Access
- 23 April 2002
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 99 (9) , 5825-5829
- https://doi.org/10.1073/pnas.032093399
Abstract
Beyond the information stored in pages of the World Wide Web, novel types of “meta-information” are created when pages connect to each other. Such meta-information is a collective effect of independent agents writing and linking pages, hidden from the casual user. Accessing it and understanding the interrelation between connectivity and content in the World Wide Web is a challenging problem [Botafogo, R. A. & Shneiderman, B. (1991) in Proceedings of Hypertext (Assoc. Comput. Mach., New York), pp. 63–77 and Albert, R. & Barabási, A.-L. (2002) Rev. Mod. Phys. 74, 47–97]. We demonstrate here how thematic relationships can be located precisely by looking only at the graph of hyperlinks, gleaning content and context from the Web without having to read what is in the pages. We begin by noting that reciprocal links (co-links) between pages signal a mutual recognition of authors and then focus on triangles containing such links, because triangles indicate a transitive relation. The importance of triangles is quantified by the clustering coefficient [Watts, D. J. & Strogatz, S. H. (1999) Nature (London) 393, 440–442], which we interpret as a curvature [Bridson, M. R. & Haefliger, A. (1999) Metric Spaces of Non-Positive Curvature (Springer, Berlin)]. This curvature defines a World Wide Web landscape whose connected regions of high curvature characterize a common topic. We show experimentally that reciprocity and curvature, when combined, accurately capture this meta-information for a wide variety of topics. As an example of future directions we analyze the neural network of Caenorhabditis elegans, using the same methods.Keywords
All Related Versions
This publication has 14 references indexed in Scilit:
- Error and attack tolerance of complex networksNature, 2000
- On the Genealogy of a Population of Biparental IndividualsJournal of Theoretical Biology, 2000
- Emergence of Scaling in Random NetworksScience, 1999
- Authoritative sources in a hyperlinked environmentJournal of the ACM, 1999
- Accessibility of information on the webNature, 1999
- The Web as a Graph: Measurements, Models, and MethodsPublished by Springer Nature ,1999
- Collective dynamics of ‘small-world’ networksNature, 1998
- The structure of the nervous system of the nematodeCaenorhabditis elegansPhilosophical Transactions of the Royal Society of London. B, Biological Sciences, 1986
- Ergodic theory of chaos and strange attractorsReviews of Modern Physics, 1985
- Co‐citation in the scientific literature: A new measure of the relationship between two documentsJournal of the American Society for Information Science, 1973