File classification in self-* storage systems
- 10 June 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. cmu cs 2 1, 44-51
- https://doi.org/10.1109/icac.2004.1301346
Abstract
To tune and manage themselves, file and storage systems must understand key properties (e.g., access pattern, lifetime, size) of their various files. This paper describes how systems can automatically learn to classify the properties of files (e.g., read-only access pattern, short-lived, small in size) and predict the properties of new files, as they are created, by exploiting the strong associations between a file's properties and the names and attributes assigned to it. These associations exist, strongly but differently, in each of four real NFS environments studied. Decision tree classifiers can automatically identify and model such associations, providing prediction accuracies that often exceed 90%. Such predictions can be used to select storage policies (e.g., disk allocation schemes and replication factors) for individual files. Further, changes in associations can expose information about applications, helping autonomic system components distinguish growth from fundamental change.Keywords
This publication has 10 references indexed in Scilit:
- Storage area networking - Object-based storageIEEE Communications Magazine, 2003
- The case for efficient file access pattern modelingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Erasure Coding Vs. Replication: A Quantitative ComparisonPublished by Springer Nature ,2002
- Markov model prediction of I/O requests for scientific applicationsPublished by Association for Computing Machinery (ACM) ,2002
- Decision Tree Induction Based on Efficient Tree RestructuringMachine Learning, 1997
- Input/output access pattern classification using hidden Markov modelsPublished by Association for Computing Machinery (ACM) ,1997
- Machine learningACM Computing Surveys, 1996
- Irrelevant Features and the Subset Selection ProblemPublished by Elsevier ,1994
- Greedy Attribute SelectionPublished by Elsevier ,1994
- Semantic file systemsACM SIGOPS Operating Systems Review, 1991