Experience with ADI-FDTD techniques on the Cray MTA supercomputer

Abstract
Finite difference, time domain (FDTD) simulations are important to the design cycle for optical communications devices. High spatial resolution is essential, and the Courant condition limits the time step, making this problem require the level of high-performance system usually only available at a remote center. Model definition and result visualization can be done locally. Recent application of the alternating direction implicit (ADI) method to FDTD removes the Courant condition, promising larger time steps for meaningful turnaround in simulations. At each time step, tridiagonal equations are solved over single dimensions of a 3D problem, but all three dimensions are involved in each time step. Thus, for a distributed memory multiprocessor, no partition of the data prevents tridiagonals from crossing processors without remapping every time step. Likewise, for cache based or vector computers, there is a stride of NxN for tridiagonals at every time step for a NxNxN grid. There is plenty of parallelism, because NxN tridiagonals can be solved simultaneously. This makes the problem well suited to a machine like the Cray multithreaded architecture (MTA) that has a large, flat memory and uses parallelism to hide memory latency. A Cray MTA implementation of the ADI-FDTD code executes serial tridiagonal solvers in parallel on multiple threads and successfully hides memory latency, achieving just over one FLOP per clock cycle per processor for a 200x200x200 grid on an 8 processor system at the San Diego Supercomputer Center. The 8 processor speed is 2.06 Gflop and the efficiency is 98%. Comparing one MTA processor, with a 250 MHz clock to a 500 MHz Alpha processor, the MTA is three times as fast for a 50x50x50 grid problem size. A vectorized version of the code run on one Cray T90 processor is three times faster than one MTA processor for a 100x100x100 grid size.

This publication has 0 references indexed in Scilit: