Improved disk-drive failure warnings
- 7 November 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Reliability
- Vol. 51 (3) , 350-357
- https://doi.org/10.1109/tr.2002.802886
Abstract
Improved methods are proposed for disk-drive failure prediction. The SMART (self monitoring and reporting technology) failure prediction system is currently implemented in disk-drives. Its purpose is to predict the near-term failure of an individual hard disk-drive, and issue a backup warning to prevent data loss. Two experimental tests of SMART show only moderate accuracy at low false-alarm rates. (A rate of 0.2% of total drives per year implies that 20% of drive returns would be good drives, relative to /spl ap/1% annual failure rate of drives). This requirement for very low false-alarm rates is well known in medical diagnostic tests for rare diseases, and methodology used there suggests ways to improve SMART. Two improved SMART algorithms are proposed. They use the SMART internal drive attribute measurements in present drives. The present warning-algorithm based on maximum error thresholds is replaced by distribution-free statistical hypothesis tests. These improved algorithms are computationally simple enough to be implemented in drive microprocessor firmware code. They require only integer sort operations to put several hundred attribute values in rank order. Some tens of these ranks are added up and the SMART warning is issued if the sum exceeds a prestored limit. These new algorithms were tested on 3744 drives of 2 models. They gave 3-4 times higher correct prediction accuracy than error thresholds on will-fail drives, at 0.2% false-alarm rate. The highest accuracies achievable are modest (40%-60%). Care was taken to test will-fail drive prediction accuracy on data independent of the algorithm design data. Additional work is needed to verify and apply these algorithms in actual drive design. They can also be useful in drive failure analysis engineering. It might be possible to screen drives in manufacturing using SMART attributes. Marginal drives might be detected before substantial final test time is invested in them, thereby decreasing manufacturing cost, and possibly decreasing overall field failure rates.Keywords
This publication has 9 references indexed in Scilit:
- Increasing Physicians’ Awareness of the Impact of Statistics on Research Outcomes: Comparative Power of the t-test and Wilcoxon Rank-Sum Test in Small Samples Applied ResearchJournal of Clinical Epidemiology, 1999
- A comprehensive review of hard-disk drive reliabilityPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Flying height measurement while seeking in hard disk drivesIEEE Transactions on Magnetics, 1998
- Future trends in hard disk drivesIEEE Transactions on Magnetics, 1996
- Condition monitoring strategy—a risk based interval selectionInternational Journal of Production Research, 1996
- Predictive maintenance using PCAControl Engineering Practice, 1995
- A technique for the measurement of track misregistration in disk filesIEEE Transactions on Magnetics, 1991
- Magnetic characterization using elements of a PRML channelIEEE Transactions on Magnetics, 1991
- Individual Comparisons by Ranking MethodsBiometrics Bulletin, 1945