Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms
- 1 January 2002
- journal article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Control and Optimization
- Vol. 41 (1) , 1-22
- https://doi.org/10.1137/s0363012998346621
Abstract
We discuss synchronous and asynchronous iterations of the form xk+1 = x^k + \gamma(k) (h(x^k)+w^k), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's lemma for the synchronous case or on Borkar's theorem for the asynchronous case. However, the analysis requires that the iterates {xk} be bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness in the deterministic framework, which is also applicable to the stochastic case when the deterministic hypotheses can be verified in the almost sure sense. This is based on scaling ideas and on the properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODEs involving nonexpansive mappings to prove convergence (with probability 1 in the stochastic case). We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and are able to relax some of the assumptions of the currently available results.Keywords
This publication has 15 references indexed in Scilit:
- Learning Algorithms for Markov Decision Processes with Average CostSIAM Journal on Control and Optimization, 2001
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement LearningSIAM Journal on Control and Optimization, 2000
- Asynchronous Stochastic ApproximationsSIAM Journal on Control and Optimization, 1998
- An analog scheme for fixed point computation. I. TheoryIEEE Transactions on Circuits and Systems I: Regular Papers, 1997
- Probability TheoryPublished by Springer Nature ,1995
- White-Noise Representations in Stochastic Realization TheorySIAM Journal on Control and Optimization, 1993
- Rate of Convergence of Recursive EstimatorsSIAM Journal on Control and Optimization, 1992
- An Analysis of Stochastic Shortest Path ProblemsMathematics of Operations Research, 1991
- Adaptive Algorithms and Stochastic ApproximationsPublished by Springer Nature ,1990
- Ergodic Theory of Random TransformationsPublished by Springer Nature ,1986