Speeding up external mergesort
- 1 April 1996
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 8 (2) , 322-332
- https://doi.org/10.1109/69.494169
Abstract
External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buffering and forecasting.Keywords
This publication has 11 references indexed in Scilit:
- Prefetching with multiple disks for external mergesort: simulation and analysisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Speeding up external mergesortIEEE Transactions on Knowledge and Data Engineering, 1996
- Merging multiple lists on hierarchical-memory multiprocessorsJournal of Parallel and Distributed Computing, 1991
- FastSort: a distributed single-input single-output external sortPublished by Association for Computing Machinery (ACM) ,1990
- Optimal disk I/O with parallel block transferPublished by Association for Computing Machinery (ACM) ,1990
- Performance comparison of distributive and mergesort as external sorting algorithmsJournal of Systems and Software, 1989
- Merging sorted runs using large main memoryActa Informatica, 1989
- Sorting large files on a backend multiprocessorIEEE Transactions on Computers, 1988
- Parallel sorting algorithms for tightly coupled multiprocessorsParallel Computing, 1988
- The I/O Performance of Multiway Mergesort and Tag SortIEEE Transactions on Computers, 1985