Statistical Simulations on Parallel Computers

1 December 2004

journal article
Published by Taylor & Francis in Journal of Computational and Graphical Statistics

Vol. 13 (4) , 886-906
https://doi.org/10.1198/106186004x12605

Abstract

The potential benefits of parallel computing for time-consuming statistical applications are well known, but have not been widely realized in practice, perhaps in part due to associated technical obstacles. This article develops a simple framework for programming statistical simulations using parallel processing, which does not require changing programming language or forgoing the use of standard statistical libraries. The basic idea of using parallel computing for statistical simulation studies is straightforward in principle, and is based on the standard master-slave model. However, there are several technical obstacles that can make it difficult to implement in practice. These include: nonreproducibility of results due to variations in the distribution of random numbers among processes, creation of excessive numbers of slaves, proliferation of slaves with very short lifetimes, and slaves destroyed due to hardware failures. This article proposes solutions for each of these difficulties, and together these solutions constitute an overall parallel computing framework for statistical simulation studies. In an experiment with 15 processors, the methods detailed here led to increases in speed by factors that can actually exceed the maximum expected factor of 15, due to the efficiencies of the proposed problem decomposition methods. Different gains may be achieved with different strategies, depending on the problem decomposition used and heterogeneity of the processors. Fault tolerance is an important feature of the framework. In an experiment with faults, a non-fault-tolerant version of our method took almost twice as long, and did not produce any results, while the fault-tolerant method dealt efficiently with the faults. We conclude that parallel computing can greatly improve the efficiency of statistical computationwithout greatly increasing programming complexity, and that it deserves wider investigation for such applications. Software to implement the proposed framework in R is available from http://www.stat.washington.edu/hana.

Keywords

This publication has 5 references indexed in Scilit:

Asynchronous Parallel Pattern Search for Nonlinear Optimization
SIAM Journal on Scientific Computing, 2001
Algorithm 806: SPRNG
ACM Transactions on Mathematical Software, 2000
A review of parallel processing for statistical computation
Statistics and Computing, 1996
Preface
The International Journal of Supercomputer Applications and High Performance Computing, 1994
Some Computer Organizations and Their Effectiveness
IEEE Transactions on Computers, 1972