Randomised allocation of treatments in sequential trials

1 March 1980

journal article
Published by Cambridge University Press (CUP) in Advances in Applied Probability

Vol. 12 (1) , 174-182
https://doi.org/10.2307/1426500

Abstract

Given a finite number of different experiments with unknown probabilitiesp₁,p₂, ···,p_kof success, the multi-armed bandit problem is concerned with maximising the expected number of successes in a sequence of trials. There are many policies which ensure that the proportion of successes converges top= max (p₁,p₂, ···,p_k), in the long run. This property is established for a class of decision procedures which rely on randomisation, at each stage, in selecting the experiment for the next trial. Further, it is suggested that some of these procedures might perform well over any finite sequence of trials.

Keywords

This publication has 7 references indexed in Scilit:

Bandit Processes and Dynamic Allocation Indices
Journal of the Royal Statistical Society Series B: Statistical Methodology, 1979
The Two-Armed Bandit and the Controlled Clinical Trial
Journal of the Royal Statistical Society: Series D (The Statistician), 1978
Play-the-Winner Rule and Inverse Sampling in Selecting the Better of Two Binomial Populations
Journal of the American Statistical Association, 1971
Play-the-winner sampling for selecting the better of two binomial populations
Biometrika, 1970
Local Convergence of Martingales and the Law of Large Numbers
The Annals of Mathematical Statistics, 1965
Design for the Control of Selection Bias
The Annals of Mathematical Statistics, 1957
Some aspects of the sequential design of experiments
Bulletin of the American Mathematical Society, 1952