High Performance FFT Algorithms for Cache-Coherent Multiprocessors
- 1 May 1999
- journal article
- research article
- Published by SAGE Publications in The International Journal of High Performance Computing Applications
- Vol. 13 (2) , 163-171
- https://doi.org/10.1177/109434209901300206
Abstract
Computing one-dimensional fast Fourier transforms (FFTs) on microprocessors requires different algorithms, depending on whether the problem fits in the data cache. This paper describes efficient algorithms for both cases. Some implementations of out-of-cache one-dimensional FFTs use a six-step approach to reduce the number of cache misses. The six-step approach may be altered into a seven-step approach that allows increased data cache reuse. A natural parallelism is also developed. Performance results using these techniques are given for the Hewlett-Packard HP 9000 V-Class V2250 server.Keywords
This publication has 7 references indexed in Scilit:
- Fast Radix 2, 3, 4, and 5 Kernels for Fast Fourier Transformations on Computers with Overlapping Multiply--Add InstructionsSIAM Journal on Scientific Computing, 1997
- High-performance FFT algorithms for the Convex C4/XA supercomputerThe Journal of Supercomputing, 1995
- FFTs in external or hierarchical memoryThe Journal of Supercomputing, 1990
- A High-Performance FFT Algorithm for Vector SupercomputersThe International Journal of Supercomputing Applications, 1988
- A high-performance fast Fourier transform algorithm for the Cray-2The Journal of Supercomputing, 1987
- What is the fast Fourier transform?IEEE Transactions on Audio and Electroacoustics, 1967
- An algorithm for the machine calculation of complex Fourier seriesMathematics of Computation, 1965