Extension of the multi-armed bandit problem
- 1 January 1983
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 22, 1179-1180
- https://doi.org/10.1109/cdc.1983.269708
Abstract
There are N independent machines. Machine i is described by a sequence {Xi(s), Fi(s)} where xi(s) is the immediate reward and Fi(s) is the information available before i is operated for the sth time. At each time one operates exactly one machine; idle machines remain frozen. The problem is to schedule the operation of the machines so as to maximize the expected total discounted sequence of rewards. The main result is that to each machine is associated an index, and the optimal policy operates the machine with the largest current index.Keywords
This publication has 4 references indexed in Scilit:
- Extensions of the multiarmed bandit problem: The discounted caseIEEE Transactions on Automatic Control, 1985
- Multi-Armed Bandits and the Gittins IndexJournal of the Royal Statistical Society Series B: Statistical Methodology, 1980
- Bandit Processes and Dynamic Allocation IndicesJournal of the Royal Statistical Society Series B: Statistical Methodology, 1979
- On Bayesian models in stochastic schedulingJournal of Applied Probability, 1977