Finite-time lower bounds for the two-armed bandit problem
- 1 April 2000
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Automatic Control
- Vol. 45 (4) , 711-714
- https://doi.org/10.1109/9.847107
Abstract
We obtain minimax lower bounds on the regret for the classical two-armed bandit problem.We provide a finite-sample minimax version of the well-known log n asymptotic lowerbound of Lai and Robbins. The finite-time lower lower bound allows us to derive conditionsfor the amount of time necessary to make any significant gain over a random guessingstrategy. These bounds depend on the class of possible distributions of the rewards associatedwith the arms. For example, in contrast to the log n...Keywords
This publication has 8 references indexed in Scilit:
- Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter spaceIEEE Transactions on Automatic Control, 1989
- Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costIEEE Transactions on Automatic Control, 1988
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewardsIEEE Transactions on Automatic Control, 1987
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewardsIEEE Transactions on Automatic Control, 1987
- Asymptotically efficient adaptive allocation rulesAdvances in Applied Mathematics, 1985
- Bandit problemsPublished by Springer Nature ,1985
- Some aspects of the sequential design of experimentsBulletin of the American Mathematical Society, 1952
- On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two SamplesBiometrika, 1933