Spectral bloom filters
- 9 June 2003
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 241-252
- https://doi.org/10.1145/872757.872787
Abstract
A Bloom Filter is a space-efficient randomized data structure allowing membership queries over sets with certain allowable errors. It is widely used in many applications which take advantage of its ability to compactly represent a set, and filter out effectively any element that does not belong to the set, with small error probability. This paper introduces the Spectral Bloom Filter (SBF), an extension of the original Bloom Filter to multi-sets, allowing the filtering of elements whose multiplicities are below a threshold given at query time. Using memory only slightly larger than that of the original Bloom Filter, the SBF supports queries on the multiplicities of individual keys with a guaranteed, small error probability. The SBF also supports insertions and deletions over the data set. We present novel methods for reducing the probability and magnitude of errors. We also present an efficient data structure and algorithms to build it incrementally and maintain it over streaming data, as well as over materialized data with arbitrary insertions and deletions. The SBF does not assume any a priori filtering threshold and effectively and efficiently maintains information over the entire data-set, allowing for ad-hoc queries with arbitrary parameters and enabling a range of new applications.Keywords
This publication has 12 references indexed in Scilit:
- Probabilistic location and routingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- New directions in traffic measurement and accountingPublished by Association for Computing Machinery (ACM) ,2002
- Summary cacheACM SIGCOMM Computer Communication Review, 1998
- New sampling-based summary statistics for improving approximate query answersPublished by Association for Computing Machinery (ACM) ,1998
- Bifocal sampling for skew-resistant join size estimationPublished by Association for Computing Machinery (ACM) ,1996
- An algorithm for approximate membership checking with application to password securityInformation Processing Letters, 1994
- Fixed-precision estimation of join selectivityPublished by Association for Computing Machinery (ACM) ,1993
- Designing a Bloom filter for differential file accessCommunications of the ACM, 1982
- Universal codeword sets and representations of the integersIEEE Transactions on Information Theory, 1975
- Space/time trade-offs in hash coding with allowable errorsCommunications of the ACM, 1970