Parallelization and Performance Analysis of the Cooley–Tukey FFT Algorithm for Shared-Memory Architectures
- 1 May 1987
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. C-36 (5) , 581-591
- https://doi.org/10.1109/tc.1987.1676943
Abstract
We present here a study of parallelization of the Cooley-Tukey radix two FFT algorithm for MIMD (nonvector) architectures. Parallel algorithms are presented for one and multidimensional Fourier transforms. From instruction traces obtained by executing Fortran kernels derived from our algorithms, we determined the precise instructions to be executed by each processor in the parallel system. We used these instruction races to predict the performance of the IBM Research Parallel Processing Prototype, RP3, as a computer of FFT's. Our performance results are depicted in graphs included in this paper.Keywords
This publication has 12 references indexed in Scilit:
- Allocating Independent Subtasks on Parallel ProcessorsIEEE Transactions on Software Engineering, 1985
- The 801 MinicomputerIBM Journal of Research and Development, 1983
- Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential ProcessorsACM Transactions on Programming Languages and Systems, 1983
- The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel ComputerIEEE Transactions on Computers, 1983
- Networks and algorithms for very-large-scale parallel computationComputer, 1982
- UltracomputersACM Transactions on Programming Languages and Systems, 1980
- Access and Alignment of Data in an Array ProcessorIEEE Transactions on Computers, 1975
- An Adaptation of the Fast Fourier Transform for Parallel ProcessingJournal of the ACM, 1968
- An algorithm for the machine calculation of complex Fourier seriesMathematics of Computation, 1965
- Predicting the Performance of Tropospheric Communication Links, Singly and in TandemIEEE Transactions on Communications, 1962