Randomised allocation of treatments in sequential trials
- 1 March 1980
- journal article
- Published by Cambridge University Press (CUP) in Advances in Applied Probability
- Vol. 12 (1) , 174-182
- https://doi.org/10.2307/1426500
Abstract
Given a finite number of different experiments with unknown probabilitiesp1,p2, ···,pkof success, the multi-armed bandit problem is concerned with maximising the expected number of successes in a sequence of trials. There are many policies which ensure that the proportion of successes converges top= max (p1,p2, ···,pk), in the long run. This property is established for a class of decision procedures which rely on randomisation, at each stage, in selecting the experiment for the next trial. Further, it is suggested that some of these procedures might perform well over any finite sequence of trials.Keywords
This publication has 7 references indexed in Scilit:
- Bandit Processes and Dynamic Allocation IndicesJournal of the Royal Statistical Society Series B: Statistical Methodology, 1979
- The Two-Armed Bandit and the Controlled Clinical TrialJournal of the Royal Statistical Society: Series D (The Statistician), 1978
- Play-the-Winner Rule and Inverse Sampling in Selecting the Better of Two Binomial PopulationsJournal of the American Statistical Association, 1971
- Play-the-winner sampling for selecting the better of two binomial populationsBiometrika, 1970
- Local Convergence of Martingales and the Law of Large NumbersThe Annals of Mathematical Statistics, 1965
- Design for the Control of Selection BiasThe Annals of Mathematical Statistics, 1957
- Some aspects of the sequential design of experimentsBulletin of the American Mathematical Society, 1952