Approximate counting of inversions in a data stream
- 19 May 2002
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 370-379
- https://doi.org/10.1145/509907.509964
Abstract
(MATH) Inversions are used as a fundamental quantity to measure the sortedness of data, to evaluate different ranking methods for databases, and in the context of rank aggregation. Considering the volume of the data sets in these applications, the data stream model {14, 2] is a natural setting to design efficient algorithms.We obtain a suite of space-efficient streaming algorithms for approximating the number of inversions in a permutation. The best space bound we achieve is $O(\log n \log \log n)$ through a deterministic algorithm. In contrast, we derive an $\Omega(n)$ lower bound for randomized exact computation for this problem; thus approximation is essential.(MATH) We also consider two generalizations of this problem: (1) approximating the number of inversions between two permutations, for which we obtain a randomized $O(\sqrt{n} \log n)$-space algorithm, and (2) approximating the number of inversions in a general list, for which we obtain a randomized $O(\sqrt{n} \log^2 n)$-space two-pass algorithm. In contrast, we derive $\Omega(n)$-space lower bounds for deterministic approximate computation for these problems; thus both randomization and approximation are essential.All our algorithms use only O(log n) time per data item.
Keywords
This publication has 13 references indexed in Scilit:
- Spot-CheckersJournal of Computer and System Sciences, 2000
- The Space Complexity of Approximating the Frequency MomentsJournal of Computer and System Sciences, 1999
- Approximate Indexed ListsJournal of Algorithms, 1998
- The anatomy of a large-scale hypertextual Web search engineComputer Networks and ISDN Systems, 1998
- Exploiting few inversions when sorting: Sequential and parallel algorithmsTheoretical Computer Science, 1996
- A survey of adaptive sorting algorithmsACM Computing Surveys, 1992
- The Probabilistic Communication Complexity of Set IntersectionSIAM Journal on Discrete Mathematics, 1992
- A fast and simple randomized parallel algorithm for the maximal independent set problemJournal of Algorithms, 1986
- Approximate counting: A detailed analysisBIT Numerical Mathematics, 1985
- Counting large numbers of events in small registersCommunications of the ACM, 1978