Two approaches to the concurrent implementation of the prime factor algorithm on a hypercube

Abstract
On sequential computers, the prime factor algorithm (PFA) allows the Computation of the discrete Fourier transform (DFT) with a higher efficiency than the traditional Cooley‐Tukey FFT algorithm (CTA). However, the PFA requires substantial data movement, which poses a challenging problem for distributed‐memory multi‐processor systems. In this paper, two approaches for a concurrent implementation of the PFA on these structures are presented. In the first approach, the concurrent PFA runs on all nodes of the multi‐processor system, which is inefficient on large configurations due to the large communication overhead. A second approach developed to reduce this bottleneck is also presented. These solutions have been benchmarked on Caltech hypercubes, and the performances achieved are reported. In both approaches, thecrystal_routeralgorithm was exploited as a concurrent technique for communicating data among nodes.

This publication has 6 references indexed in Scilit: