Constructing Summary Statistics for Approximate Bayesian Computation: Semi-Automatic Approximate Bayesian Computation
Top Cited Papers
Open Access
- 15 May 2012
- journal article
- Published by Oxford University Press (OUP) in Journal of the Royal Statistical Society Series B: Statistical Methodology
- Vol. 74 (3) , 419-474
- https://doi.org/10.1111/j.1467-9868.2011.01010.x
Abstract
Summary. Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference.Keywords
This publication has 89 references indexed in Scilit:
- Lack of confidence in approximate Bayesian computation model choiceProceedings of the National Academy of Sciences, 2011
- ABC-SysBio—approximate Bayesian computation in Python with GPU supportBioinformatics, 2010
- Predictive response-relevant clustering of expression data provides insights into disease processesNucleic Acids Research, 2010
- Model choice versus model criticismProceedings of the National Academy of Sciences, 2010
- Model criticism based on likelihood-free inference, with an application to protein network evolutionProceedings of the National Academy of Sciences, 2009
- Rapid Evolution and the Importance of Recombination to the Gastroenteric Pathogen Campylobacter jejuniMolecular Biology and Evolution, 2008
- Inferring population history withDIY ABC: a user-friendly approach to approximate Bayesian computationBioinformatics, 2008
- A new approach to estimate parameters of speciation models with application to apesGenome Research, 2007
- Sequential Monte Carlo without likelihoodsProceedings of the National Academy of Sciences, 2007
- Exact stochastic simulation of coupled chemical reactionsThe Journal of Physical Chemistry, 1977