Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter space
- 1 March 1989
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Automatic Control
- Vol. 34 (3) , 258-267
- https://doi.org/10.1109/9.16415
Abstract
The authors consider a controlled i.i.d. (independently identically distributed) process whose distribution is parametrized by an unknown parameter theta belonging to some known parameter space Theta , and a one-step reward associated with each pair of control and the following state of the process. The objective is to maximize the expected value of the sum of one-step rewards over an infinite horizon. By introducing the loss associated with a control scheme, it is shown that the problem is equivalent to minimizing this loss. Uniformly good adaptive control schemes are defined and emphasized. A lower bound on the loss associated with any uniformly good control scheme is developed. Finally, an adaptive control scheme is constructed whose loss equals the lower bound, and is therefore asymptotically efficient.<>Keywords
This publication has 6 references indexed in Scilit:
- Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costIEEE Transactions on Automatic Control, 1988
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewardsIEEE Transactions on Automatic Control, 1987
- Asymptotically efficient adaptive allocation rulesAdvances in Applied Mathematics, 1985
- Entropy, Large Deviations, and Statistical MechanicsPublished by Springer Nature ,1985
- The Minimax Risk for the Two-Armed Bandit ProblemPublished by Springer Nature ,1983
- An Asymptotic Minimax Theorem for the Two Armed Bandit ProblemThe Annals of Mathematical Statistics, 1960