Extension of the multi-armed bandit problem

1 January 1983

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 22, 1179-1180
https://doi.org/10.1109/cdc.1983.269708

Abstract

There are N independent machines. Machine i is described by a sequence {Xi(s), Fi(s)} where xi(s) is the immediate reward and Fi(s) is the information available before i is operated for the sth time. At each time one operates exactly one machine; idle machines remain frozen. The problem is to schedule the operation of the machines so as to maximize the expected total discounted sequence of rewards. The main result is that to each machine is associated an index, and the optimal policy operates the machine with the largest current index.

Keywords

LABORATORIES

This publication has 4 references indexed in Scilit:

Extensions of the multiarmed bandit problem: The discounted case
IEEE Transactions on Automatic Control, 1985
Multi-Armed Bandits and the Gittins Index
Journal of the Royal Statistical Society Series B: Statistical Methodology, 1980
Bandit Processes and Dynamic Allocation Indices
Journal of the Royal Statistical Society Series B: Statistical Methodology, 1979
On Bayesian models in stochastic scheduling
Journal of Applied Probability, 1977