A parallel algorithm for record clustering

1 December 1990

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems

Vol. 15 (4) , 599-624
https://doi.org/10.1145/99935.99947

Abstract

We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an enviornment with relatively little overlap among the queries.

Keywords

This publication has 16 references indexed in Scilit:

A taxonomy of parallel sorting
ACM Computing Surveys, 1984
Parallel graph algorithms
ACM Computing Surveys, 1984
The Grid File
ACM Transactions on Database Systems, 1984
Parallel algorithms for the execution of relational database operations
ACM Transactions on Database Systems, 1983
Linear sorting withO(logn) processors
BIT Numerical Mathematics, 1983
Disk allocation for Cartesian product files on multiple-disk systems
ACM Transactions on Database Systems, 1982
Reducing block accesses in inverted files by partial clustering
Information Systems, 1980
Concepts and capabilities of a database computer\
ACM Transactions on Database Systems, 1978
Fast parallel sorting algorithms
Communications of the ACM, 1978
Optimal Sorting Algorithms for Parallel Computers
IEEE Transactions on Computers, 1978