Machine learning and nonparametric bandit theory
- 1 July 1995
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Automatic Control
- Vol. 40 (7) , 1199-1209
- https://doi.org/10.1109/9.400491
Abstract
In its most basic form, bandit theory is concerned with the design problem of sequentially choosing members from a given collection of random variables so that the regret, i.e., R/sub n/=/spl Sigma//sub j/(/spl mu/*-/spl mu//sub j/)ET/sub n/(j), grows as slowly as possible with increasing n. Here /spl mu//sub j/ is the expected value of the bandit arm (i.e., random variable) indexed by j, T/sub n/(j) is the number of times arm j has been selected in the first n decision stages, and /spl mu//sup */=sup/sub j/ /spl mu//sub j/. The present paper contributes to the theory by considering the situation in which observations are dependent. To begin with, the dependency is presumed to depend only on past observations of the same arm, but later, we allow that it may be with respect to the entire past and that the set of arms is infinite. This brings queues and, more generally, controlled Markov processes into our purview. Thus our "black-box" methodology is suitable for the case when the only observables are cost values and, in particular, the probability structure and loss function are unknown to the designer. The conclusion of the analysis is that under lenient conditions, using algorithms prescribed herein, risk growth is commensurate with that in the simplest i.i.d. cases. Our methods represent an alternative to stochastic-approximation/perturbation-analysis ideas for tuning queues.Keywords
This publication has 17 references indexed in Scilit:
- Optimization of Queues Using an Infinitesimal Perturbation Analysis-Based Stochastic Algorithm with General Update TimesSIAM Journal on Control and Optimization, 1993
- Nonparametric bandit methodsAnnals of Operations Research, 1991
- Parallel Recursive Algorithms in Asymptotically Efficient Adaptive Control of Linear Stochastic SystemsSIAM Journal on Control and Optimization, 1991
- Exponential convergence under mixingProbability Theory and Related Fields, 1989
- Large Deviation Principles for Stationary ProcessesThe Annals of Probability, 1988
- The large deviation principle for hypermixing processesProbability Theory and Related Fields, 1988
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewardsIEEE Transactions on Automatic Control, 1987
- Large deviations of uniformly recurrent Markov additive processesAdvances in Applied Mathematics, 1985
- Asymptotically efficient adaptive allocation rulesAdvances in Applied Mathematics, 1985
- Some aspects of the sequential design of experimentsBulletin of the American Mathematical Society, 1952