On the Bernoulli two-armed bandit problem
- 1 January 1981
- journal article
- research article
- Published by Taylor & Francis in Mathematische Operationsforschung und Statistik. Series Optimization
- Vol. 12 (2) , 307-316
- https://doi.org/10.1080/02331938108842729
Abstract
The paper is initially concerned with monotonic properties of the posterior success probabilities when the prior success probabilities are distributed according to an arbitrary joint distribution function (Bayesian approach). Next a dynamic programming model is proposed and monotonic properties of the optimal expected cumulative discounted reward are proved. Finally, optimality properties are given for the case when one prior success probability is known.Keywords
This publication has 4 references indexed in Scilit:
- Bernoulli One-Armed Bandits--Arbitrary Discount SequencesThe Annals of Statistics, 1979
- On the two armed bandit with one probability knownMetrika, 1978
- Some results for the two armed bandit problemMathematische Operationsforschung und Statistik, 1976
- Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimalProbability Theory and Related Fields, 1975