Faster methods for random sampling
- 1 July 1984
- journal article
- Published by Association for Computing Machinery (ACM) in Communications of the ACM
- Vol. 27 (7) , 703-718
- https://doi.org/10.1145/358105.893
Abstract
Several new methods are presented for selecting n records at random without replacement from a file containing N records. Each algorithm selects the records for the sample in a sequential manner—in the same order the records appear in the file. The algorithms are online in that the records for the sample are selected iteratively with no preprocessing. The algorithms require a constant amount of space and are short and easy to implement. The main result of this paper is the design and analysis of Algorithm D, which does the sampling in O ( n ) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations (of the form a b , for real numbers a and b) are performed during the sampling. This solves an open problem in the literature. CPU timings on a large mainframe computer indicate that Algorithm D is significantly faster than the sampling algorithms in use today.Keywords
This publication has 5 references indexed in Scilit:
- Optimum algorithms for two random sampling problemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1983
- An Algorithm for Unbiased Random SamplingThe Computer Journal, 1982
- Generating Sorted Lists of Random NumbersACM Transactions on Mathematical Software, 1980
- A note on sampling a tape-fileCommunications of the ACM, 1962
- Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital ComputersJournal of the American Statistical Association, 1962