Reclustering of high energy physics data

20 January 2003

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 194-203
https://doi.org/10.1109/ssdm.1999.787635

Abstract

The coming high energy physics experiments will store Petabytes of data into object databases. Analysis jobs will frequently traverse collections containing millions of stored objects. Clustering is one of the most effective means to enhance the performance of these applications. The paper presents a reclustering algorithm for independent objects contained in multiple possibly overlapping collections on secondary storage. The algorithm decomposes the stored objects into a number of independent chunks and then maps these chunks to a traveling salesman problem. Under a set of realistic assumptions, the number of disk seeks is reduced almost to the theoretical minimum. Experimental results obtained from a prototype are included.

Keywords

This publication has 8 references indexed in Scilit:

Automatic reclustering of objects in very large databases for high energy physics
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Optimizing queries with materialized views
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Future trends in hard disk drives
IEEE Transactions on Magnetics, 1996
Efficient organization and access of multi-dimensional datasets on tertiary storage systems
Information Systems, 1995
Partition-based clustering in object bases: From theory to practice
Published by Springer Nature ,1993
A stochastic approach for clustering in object bases
Published by Association for Computing Machinery (ACM) ,1991
NP-completeness of the hamming salesman problem
BIT Numerical Mathematics, 1985
File organization
Communications of the ACM, 1972