Finite Horizon Behavior of Policies for Two-Arm Bandits
- 1 December 1974
- journal article
- research article
- Published by JSTOR in Journal of the American Statistical Association
- Vol. 69 (348) , 963
- https://doi.org/10.2307/2286172
Abstract
Monte Carlo experiments indicate that the widely publicized “Play-the-Winner” strategy for two-arm bandits does badly relative to certain other strategies for moderate finite horizons, say, more than 50 trials. These latter strategies are also long-run average optimal, whereas “Play-the Winner” is not. From the experiments, a particular long-run average optimal strategy is recommended on the basis of its empirical finite horizon behavior. Some implications for sequential testing are indicated.Keywords
This publication has 0 references indexed in Scilit: