Constructing Summary Statistics for Approximate Bayesian Computation: Semi-Automatic Approximate Bayesian Computation

Top Cited Papers

Open Access

15 May 2012

journal article
Published by Oxford University Press (OUP) in Journal of the Royal Statistical Society Series B: Statistical Methodology

Vol. 74 (3) , 419-474
https://doi.org/10.1111/j.1467-9868.2011.01010.x

Abstract

Summary. Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference.

Keywords

This publication has 89 references indexed in Scilit:

Lack of confidence in approximate Bayesian computation model choice
Proceedings of the National Academy of Sciences, 2011
ABC-SysBio—approximate Bayesian computation in Python with GPU support
Bioinformatics, 2010
Predictive response-relevant clustering of expression data provides insights into disease processes
Nucleic Acids Research, 2010
Model choice versus model criticism
Proceedings of the National Academy of Sciences, 2010
Model criticism based on likelihood-free inference, with an application to protein network evolution
Proceedings of the National Academy of Sciences, 2009
Rapid Evolution and the Importance of Recombination to the Gastroenteric Pathogen Campylobacter jejuni
Molecular Biology and Evolution, 2008
Inferring population history withDIY ABC: a user-friendly approach to approximate Bayesian computation
Bioinformatics, 2008
A new approach to estimate parameters of speciation models with application to apes
Genome Research, 2007
Sequential Monte Carlo without likelihoods
Proceedings of the National Academy of Sciences, 2007
Exact stochastic simulation of coupled chemical reactions
The Journal of Physical Chemistry, 1977