Extracting knowledge from the World Wide Web
- 6 April 2004
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 101 (suppl_1) , 5186-5191
- https://doi.org/10.1073/pnas.0307528100
Abstract
The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesting and valuable information can be extracted, such as the distribution of web pages over domains, the distribution of interest in different areas, communities related to different topics, the nature of competition in different categories of sites, and the degree of communication between different communities or countries.Keywords
This publication has 18 references indexed in Scilit:
- Self-organization and identification of Web communitiesComputer, 2002
- Winners don't take all: Characterizing the competition for links on the webProceedings of the National Academy of Sciences, 2002
- Topology of Evolving Networks: Local Events and UniversalityPhysical Review Letters, 2000
- Structure of Growing Networks with Preferential LinkingPhysical Review Letters, 2000
- On near-uniform URL samplingComputer Networks, 2000
- Graph structure in the WebComputer Networks, 2000
- Mean-field theory for scale-free random networksPhysica A: Statistical Mechanics and its Applications, 1999
- Trawling the Web for emerging cyber-communitiesComputer Networks, 1999
- The anatomy of a large-scale hypertextual Web search engineComputer Networks and ISDN Systems, 1998
- ON A CLASS OF SKEW DISTRIBUTION FUNCTIONSBiometrika, 1955