Adaptive Treatment Allocation and the Multi-Armed Bandit Problem
Open Access
- 1 September 1987
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 15 (3) , 1091-1114
- https://doi.org/10.1214/aos/1176350495
Abstract
A class of simple adaptive allocation rules is proposed for the problem (often called the "multi-armed bandit problem") of sampling $x_1, \cdots x_N$ sequentially from $k$ populations with densities belonging to an exponential family, in order to maximize the expected value of the sum $S_N = x_1 + \cdots + x_N$. These allocation rules are based on certain upper confidence bounds, which are developed from boundary crossing theory, for the $k$ population parameters. The rules are shown to be asymptotically optimal as $N \rightarrow \infty$ from both Bayesian and frequentist points of view. Monte Carlo studies show that they also perform very well for moderate values of the horizon $N$.
Keywords
This publication has 0 references indexed in Scilit: