On the Bernoulli two-armed bandit problem

1 January 1981

journal article
research article
Published by Taylor & Francis in Mathematische Operationsforschung und Statistik. Series Optimization

Vol. 12 (2) , 307-316
https://doi.org/10.1080/02331938108842729

Abstract

The paper is initially concerned with monotonic properties of the posterior success probabilities when the prior success probabilities are distributed according to an arbitrary joint distribution function (Bayesian approach). Next a dynamic programming model is proposed and monotonic properties of the optimal expected cumulative discounted reward are proved. Finally, optimality properties are given for the case when one prior success probability is known.

Keywords

This publication has 4 references indexed in Scilit:

Bernoulli One-Armed Bandits--Arbitrary Discount Sequences
The Annals of Statistics, 1979
On the two armed bandit with one probability known
Metrika, 1978
Some results for the two armed bandit problem
Mathematische Operationsforschung und Statistik, 1976
Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal
Probability Theory and Related Fields, 1975