Recursive Algorithms for Adaptive Control of Finite Markov Chains
- 1 January 1981
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics
- Vol. 11 (2) , 135-144
- https://doi.org/10.1109/tsmc.1981.4308638
Abstract
The problem of controlling a finite Markov chain so as to maximize the long-run expected reward per unit time is studied. The chain's transition probabilities depend upon an unknown parameter taking values in a subset [a,b] of Rn. A control policy is defined as the probability of selecting a control action for each state of the chain. Derived is a Taylor-like expansion formula for the expected reward in terms of policy variations. Based on that result a recursive stochastic gradient algorithm is presented for the adaptation of the control policy at consecutive times. The gradient depends on the estimated transition parameter which is also recursively updated using the gradient of the likelihood function. Convergence with probability 1 is proved for the control and estimation algorithms.Keywords
This publication has 5 references indexed in Scilit:
- Strong consistency of a modified maximum likelihood estimator for controlled Markov chainsJournal of Applied Probability, 1980
- Estimation and control in Markov chainsAdvances in Applied Probability, 1974
- Some Classes of Multi-Input AutomataJournal of Cybernetics, 1972
- A CONVERGENCE THEOREM FOR NON NEGATIVE ALMOST SUPERMARTINGALES AND SOME APPLICATIONS**Research supported by NIH Grant 5-R01-GM-16895-03 and ONR Grant N00014-67-A-0108-0018.Published by Elsevier ,1971
- An adaptive automaton controller for discrete-time markov processesAutomatica, 1969