A similarity graph-based approach to declustering problems and its application towards parallelizing grid files

19 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 373-381
https://doi.org/10.1109/icde.1995.380370

Abstract

We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing grid files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.<>

Keywords

This publication has 12 references indexed in Scilit:

Declustering using fractals
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Latin squares for parallel array access
IEEE Transactions on Parallel and Distributed Systems, 1993
Disk allocation methods using error correcting codes
IEEE Transactions on Computers, 1991
An improved two-way partitioning algorithm with stable performance (VLSI)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1991
The Gamma database machine project
IEEE Transactions on Knowledge and Data Engineering, 1990
Optimal file distribution for partial match retrieval
ACM SIGMOD Record, 1988
Performance of two-disk partition data allocations
BIT Numerical Mathematics, 1987
An Algorithm for Partitioning the Nodes of a Graph
SIAM Journal on Algebraic Discrete Methods, 1982
Disk allocation for Cartesian product files on multiple-disk systems
ACM Transactions on Database Systems, 1982
An Efficient Heuristic Procedure for Partitioning Graphs
Bell System Technical Journal, 1970