Unconstrained endpoint profiling (googling the internet)
- 17 August 2008
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGCOMM Computer Communication Review
- Vol. 38 (4) , 279-290
- https://doi.org/10.1145/1402946.1402991
Abstract
Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around us - on the web. In this paper, we introduce a novel approach for profiling and classifying endpoints. We implement and deploy a Google-based profiling tool, which accurately characterizes endpoint behavior by collecting and strategically combining information freely available on the web. Our 'unconstrained endpoint profiling' approach shows remarkable advances in the following scenarios: (i) Even when no packet traces are available, it can accurately predict application and protocol usage trends at arbitrary networks; (ii) When network traces are available, it dramatically outperforms state-of-the-art classification tools; (iii) When sampled flow-level traces are available, it retains high classification capabilities when other schemes literally fall apart. Using this approach, we perform unconstrained endpoint profiling at a global scale: for clients in four different world regions (Asia, South and North America and Europe). We provide the first-of-its-kind endpoint analysis which reveals fascinating similarities and differences among these regions.Keywords
This publication has 19 references indexed in Scilit:
- Using uncleanliness to predict future botnet addressesPublished by Association for Computing Machinery (ACM) ,2007
- Measurement and analysis of online social networksPublished by Association for Computing Machinery (ACM) ,2007
- Early application identificationPublished by Association for Computing Machinery (ACM) ,2006
- Internet traffic classification using bayesian analysis techniquesPublished by Association for Computing Machinery (ACM) ,2005
- ACASPublished by Association for Computing Machinery (ACM) ,2005
- Analysis of Peer-to-Peer Traffic on ADSLPublished by Springer Nature ,2005
- Toward the Accurate Identification of Network ApplicationsPublished by Springer Nature ,2005
- Accurate, scalable in-network identification of p2p traffic using application signaturesPublished by Association for Computing Machinery (ACM) ,2004
- Flow Clustering Using Machine Learning TechniquesPublished by Springer Nature ,2004
- BGP routing stability of popular destinationsPublished by Association for Computing Machinery (ACM) ,2002