The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems
- 1 November 1977
- journal article
- Published by Institute for Operations Research and the Management Sciences (INFORMS) in Mathematics of Operations Research
- Vol. 2 (4) , 360-381
- https://doi.org/10.1287/moor.2.4.360
Abstract
This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n → ∞ for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.Keywords
This publication has 0 references indexed in Scilit: