Finite-time lower bounds for the two-armed bandit problem

Abstract
We obtain minimax lower bounds on the regret for the classical two-armed bandit problem.We provide a finite-sample minimax version of the well-known log n asymptotic lowerbound of Lai and Robbins. The finite-time lower lower bound allows us to derive conditionsfor the amount of time necessary to make any significant gain over a random guessingstrategy. These bounds depend on the class of possible distributions of the rewards associatedwith the arms. For example, in contrast to the log n...