Gradient approach for recursive estimation and control in finite Markov chains
- 1 December 1981
- journal article
- Published by Cambridge University Press (CUP) in Advances in Applied Probability
- Vol. 13 (4) , 778-803
- https://doi.org/10.2307/1426973
Abstract
The problem studied is that of controlling a finite Markov chain so as to maximize the long-run expected reward per unit time. The chain's transition probabilities depend upon an unknown parameter taking values in a subset [a, b] of Rn. A control policy is defined as the probability of selecting a control action for each state of the chain. Derived is a Taylor-like expansion formula for the expected reward in terms of policy variations. Based on that result, a recursive stochastic gradient algorithm is presented for the adaptation of the control policy at consecutive times. The gradient depends on the estimated transition parameter which is also recursively updated using the gradient of the likelihood function. Convergence with probability 1 is proved for the control and estimation algorithms.Keywords
This publication has 4 references indexed in Scilit:
- Strong consistency of a modified maximum likelihood estimator for controlled Markov chainsJournal of Applied Probability, 1980
- Estimation and control in Markov chainsAdvances in Applied Probability, 1974
- Some Classes of Multi-Input AutomataJournal of Cybernetics, 1972
- An adaptive automaton controller for discrete-time markov processesAutomatica, 1969