Reclustering of high energy physics data
- 20 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 194-203
- https://doi.org/10.1109/ssdm.1999.787635
Abstract
The coming high energy physics experiments will store Petabytes of data into object databases. Analysis jobs will frequently traverse collections containing millions of stored objects. Clustering is one of the most effective means to enhance the performance of these applications. The paper presents a reclustering algorithm for independent objects contained in multiple possibly overlapping collections on secondary storage. The algorithm decomposes the stored objects into a number of independent chunks and then maps these chunks to a traveling salesman problem. Under a set of realistic assumptions, the number of disk seeks is reduced almost to the theoretical minimum. Experimental results obtained from a prototype are included.Keywords
This publication has 8 references indexed in Scilit:
- Automatic reclustering of objects in very large databases for high energy physicsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Optimizing queries with materialized viewsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Future trends in hard disk drivesIEEE Transactions on Magnetics, 1996
- Efficient organization and access of multi-dimensional datasets on tertiary storage systemsInformation Systems, 1995
- Partition-based clustering in object bases: From theory to practicePublished by Springer Nature ,1993
- A stochastic approach for clustering in object basesPublished by Association for Computing Machinery (ACM) ,1991
- NP-completeness of the hamming salesman problemBIT Numerical Mathematics, 1985
- File organizationCommunications of the ACM, 1972