Optimizing retrieval and processing of multi-dimensional scientific datasets

7 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 405-410
https://doi.org/10.1109/ipdps.2000.846013

Abstract

We have developed the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we investigate approaches to guide and automate the selection of the best strategy for a given application and machine configuration. We describe analytical models to predict the relative performance of the strategies where input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular d-dimensional array.

Keywords

This publication has 3 references indexed in Scilit:

Declustering using fractals
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines
Published by Springer Nature ,1998
The implementation of POSTGRES
IEEE Transactions on Knowledge and Data Engineering, 1990