Linear hash functions
- 1 September 1999
- journal article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 46 (5) , 667-683
- https://doi.org/10.1145/324133.324179
Abstract
Consider the set H of all linear (or affine) transformations between two vector spaces over a finite field F. We study how good H is as a class of hash functions, namely we consider hashing a set S of size n into a range having the same cardinality n by a randomly chosen function from H and look at the expected size of the largest hash bucket. H is a universal class of hash functions for any finite field, but with respect to our measure different fields behave differently.If the finite field F has n elements, then there is a bad set S ⊂ F2 of size n with expected maximal bucket size H(n1/3). If n is a perfect square, then there is even a bad set with largest bucket size always at least n. (This is worst possible, since with respect to a universal class of hash functions every set of size n has expected largest bucket size below n + 1/2.)If, however, we consider the field of two elements, then we get much better bounds. The best previously known upper bound on the expected size of the largest bucket for this class was O(2 log n). We reduce this upper bound to O(log n log logn). Note that this is not far from the guarantee for a random function. There, the average largest bucket would be &THgr;(log n/ log log n).In the course of our proof we develop a tool which may be of independent interest. Suppose we have a subset S of a vector space D over Z2, and consider a random linear mapping of D to a smaller vector space R. If the cardinality of S is larger than c&egr;|R|log|R|, then with probability 1 - &egr;, the image of S will cover all elements in the range.Keywords
This publication has 9 references indexed in Scilit:
- A Reliable Randomized Algorithm for the Closest-Pair ProblemJournal of Algorithms, 1997
- Combinatorial GeometryPublished by Wiley ,1995
- Dynamic Perfect Hashing: Upper and Lower BoundsSIAM Journal on Computing, 1994
- The computational complexity of universal hashingTheoretical Computer Science, 1993
- A fast and simple randomized parallel algorithm for the maximal independent set problemJournal of Algorithms, 1986
- Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memoriesActa Informatica, 1984
- Storing a Sparse Table with 0 (1) Worst Case Access TimeJournal of the ACM, 1984
- Universal classes of hash functionsJournal of Computer and System Sciences, 1979
- On the Uniform Convergence of Relative Frequencies of Events to Their ProbabilitiesTheory of Probability and Its Applications, 1971