Simulation-based optimization of Markov reward processes
Top Cited Papers
- 1 February 2001
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Automatic Control
- Vol. 46 (2) , 191-209
- https://doi.org/10.1109/9.905687
Abstract
This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The algorithm relies on the regenerative structure of finite-state Markov processes, involves the simulation of a single sample path, and can be implemented online. A convergence result (with probability 1) is provided.Keywords
This publication has 19 references indexed in Scilit:
- Actor-Critic--Type Learning Algorithms for Markov Decision ProcessesSIAM Journal on Control and Optimization, 1999
- Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realizationIEEE Transactions on Control Systems Technology, 1998
- Perturbation realization, potentials, and sensitivity analysis of Markov processesIEEE Transactions on Automatic Control, 1997
- Stochastic approximation with two time scalesSystems & Control Letters, 1997
- General results on the convergence of stochastic algorithmsIEEE Transactions on Automatic Control, 1996
- Stochastic optimization of regenerative systems using infinitesimal perturbation analysisIEEE Transactions on Automatic Control, 1994
- Smoothed perturbation analysis derivative estimation for Markov chainsOperations Research Letters, 1994
- Practical issues in temporal difference learningMachine Learning, 1992
- Importance Sampling for Stochastic SimulationsManagement Science, 1989
- Analysis of recursive stochastic algorithmsIEEE Transactions on Automatic Control, 1977