On the Bernoulli two-armed bandit problem

Abstract
The paper is initially concerned with monotonic properties of the posterior success probabilities when the prior success probabilities are distributed according to an arbitrary joint distribution function (Bayesian approach). Next a dynamic programming model is proposed and monotonic properties of the optimal expected cumulative discounted reward are proved. Finally, optimality properties are given for the case when one prior success probability is known.

This publication has 4 references indexed in Scilit: