Mining block correlations to improve storage performance
- 1 May 2005
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Storage
- Vol. 1 (2) , 213-245
- https://doi.org/10.1145/1063786.1063790
Abstract
Block correlations are common semantic patterns in storage systems. They can be exploited for improving the effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, information about block correlations is unavailable at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough for discovering block correlations in storage systems.In this article, we propose two algorithms, C-Miner and C-Miner *, that use a data mining technique called frequent sequence mining to discover block correlations in storage systems. Both algorithms run reasonably fast with feasible space requirement, indicating that they are practical for dynamically inferring correlations in a storage system. C-Miner is a direct application of a frequent-sequence mining algorithm with a few modifications; compared with C-Miner , C-Miner * is redesigned for mining block correlations by making concessions for the specific problem of long sequences in storage system traces. Therefore, C-Miner * can discover 7--109% more correlation rules within 2--15 times shorter time than C-Miner . Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12--30% compared to the base case, and 7--25% compared to the commonly used sequential prefetching scheme for most workloads.Keywords
This publication has 28 references indexed in Scilit:
- Conserving disk energy in network serversPublished by Association for Computing Machinery (ACM) ,2003
- Learning to classify parallel input/output access patternsIEEE Transactions on Parallel and Distributed Systems, 2002
- Information and control in gray-box systemsPublished by Association for Computing Machinery (ACM) ,2001
- Compiler-based I/O prefetching for out-of-core applicationsACM Transactions on Computer Systems, 2001
- I/O reference behavior of production database workloads and the TPC benchmarks—an analysis at the logical levelACM Transactions on Database Systems, 2001
- Towards application/file-level characterization of block referencesPublished by Association for Computing Machinery (ACM) ,2000
- A trace-driven comparison of algorithms for parallel prefetching and cachingPublished by Association for Computing Machinery (ACM) ,1996
- Informed prefetching and cachingPublished by Association for Computing Machinery (ACM) ,1995
- A modeling study of the TPC-C benchmarkACM SIGMOD Record, 1993
- Sequentiality and prefetching in database systemsACM Transactions on Database Systems, 1978