Minimizing the learning loss in adaptive control of Markov chains under the weak accessibility condition
- 1 March 1991
- journal article
- research article
- Published by Cambridge University Press (CUP) in Journal of Applied Probability
- Vol. 28 (04) , 779-790
- https://doi.org/10.1017/s0021900200042698
Abstract
We consider the adaptive control of Markov chains under the weak accessibility condition with a view to minimizing the learning loss. A certainty equivalence control with a forcing scheme is constructed. We use a stationary randomized control scheme for forcing and compute a maximum likelihood estimate of the unknown parameter from the resulting observations. We obtain an exponential upper bound on the rate of decay of the probability of error. This allows us to choose the rate of forcing appropriately, whereby we achieve a o(f(n) log n) learning loss for any function as .Keywords
This publication has 8 references indexed in Scilit:
- Certainty equivalence control with forcing: revisitedSystems & Control Letters, 1989
- Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter spaceIEEE Transactions on Automatic Control, 1989
- Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching costIEEE Transactions on Automatic Control, 1988
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewardsIEEE Transactions on Automatic Control, 1987
- A Survey of Some Results in Stochastic Adaptive ControlSIAM Journal on Control and Optimization, 1985
- Asymptotically efficient adaptive allocation rulesAdvances in Applied Mathematics, 1985
- Optimal decision procedures for finite Markov chains. Part II: Communicating systemsAdvances in Applied Probability, 1973
- Perturbation theory and finite Markov chainsJournal of Applied Probability, 1968