Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- 1 January 1999
- journal article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Control and Optimization
- Vol. 38 (1) , 94-123
- https://doi.org/10.1137/s036301299731669x
Abstract
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.Keywords
This publication has 16 references indexed in Scilit:
- Asynchronous Stochastic ApproximationsSIAM Journal on Control and Optimization, 1998
- A New Value Iteration method for the Average Cost Dynamic Programming ProblemSIAM Journal on Control and Optimization, 1998
- The actor-critic algorithm as multi-time-scale stochastic approximationSādhanā, 1997
- An analog scheme for fixed point computation. I. TheoryIEEE Transactions on Circuits and Systems I: Regular Papers, 1997
- Stochastic approximation with two time scalesSystems & Control Letters, 1997
- Recursive self-tuning control of finite Markov chainsApplicationes Mathematicae, 1997
- A tutorial survey of reinforcement learningSādhanā, 1994
- An Analysis of Stochastic Shortest Path ProblemsMathematics of Operations Research, 1991
- Convergent activation dynamics in continuous time networksNeural Networks, 1989
- Applications of Singular Perturbation Techniques to Control ProblemsSIAM Review, 1984