Statistical Simulations on Parallel Computers

Abstract
The potential benefits of parallel computing for time-consuming statistical applications are well known, but have not been widely realized in practice, perhaps in part due to associated technical obstacles. This article develops a simple framework for programming statistical simulations using parallel processing, which does not require changing programming language or forgoing the use of standard statistical libraries. The basic idea of using parallel computing for statistical simulation studies is straightforward in principle, and is based on the standard master-slave model. However, there are several technical obstacles that can make it difficult to implement in practice. These include: nonreproducibility of results due to variations in the distribution of random numbers among processes, creation of excessive numbers of slaves, proliferation of slaves with very short lifetimes, and slaves destroyed due to hardware failures. This article proposes solutions for each of these difficulties, and together these solutions constitute an overall parallel computing framework for statistical simulation studies. In an experiment with 15 processors, the methods detailed here led to increases in speed by factors that can actually exceed the maximum expected factor of 15, due to the efficiencies of the proposed problem decomposition methods. Different gains may be achieved with different strategies, depending on the problem decomposition used and heterogeneity of the processors. Fault tolerance is an important feature of the framework. In an experiment with faults, a non-fault-tolerant version of our method took almost twice as long, and did not produce any results, while the fault-tolerant method dealt efficiently with the faults. We conclude that parallel computing can greatly improve the efficiency of statistical computationwithout greatly increasing programming complexity, and that it deserves wider investigation for such applications. Software to implement the proposed framework in R is available from http://www.stat.washington.edu/hana.

This publication has 5 references indexed in Scilit: