Bigfoot, sasquatch, the yeti and other missing links
- 20 October 2008
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 325-330
- https://doi.org/10.1145/1452520.1452558
Abstract
Copyright © 2008 ACMStudy of the Internet's high-level structure has for some time intrigued scientists. The AS-graph (showing interconnections between Autonomous Systems) has been measured, studied, modelled and discussed in many papers over the last decade. However, the quality of the measurement data has always been in question. It is by now well known that most measurements of the AS-graph are missing some set of links. Many efforts have been undertaken to correct this, primarily by increasing the set of measurements, but the issue remains: how much is enough? When will we know that we have enough measurements to be sure we can see all (or almost all) of the links. This paper aims to address the problem of estimating how many links are missing from our measurements. We use techniques pioneered in biostatistics and epidemiology for estimating the size of populations (for instance of fish or disease carriers). It is rarely possible to observe entire populations, and so sampling techniques are used. We extend those techniques to the domain of the AS-graph. The key difference between our work and the biological literature is that all links are not the same, and so we build a stratified model and specify an EM algorithm for estimating its parameters. Our estimates suggest that a very significant number of links (many of thousands) are missing from standard route monitor measurements of the AS-graph. Finally, we use the model to derive the number of monitors that would be needed to see a complete AS-graph with high-probability. We estimate that 700 route monitors would see 99.9% of linksKeywords
This publication has 10 references indexed in Scilit:
- Testing the reachability of (new) address spacePublished by Association for Computing Machinery (ACM) ,2007
- AS relationshipsACM SIGCOMM Computer Communication Review, 2007
- Building an AS-topology model that captures route diversityPublished by Association for Computing Machinery (ACM) ,2006
- The “robust yet fragile” nature of the InternetProceedings of the National Academy of Sciences, 2005
- Collecting the internet AS-level topologyACM SIGCOMM Computer Communication Review, 2005
- Sampling biases in IP topology measurementsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- The Elements of Statistical LearningPublished by Springer Nature ,2001
- On power-law relationships of the Internet topologyPublished by Association for Computing Machinery (ACM) ,1999
- Capture-Recapture and Multiple-Record Systems Estimation II: Applications in Human DiseasesAmerican Journal of Epidemiology, 1995
- Truncated Binomial and Negative Binomial DistributionsJournal of the American Statistical Association, 1955