The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems

1 November 1977

journal article
Published by Institute for Operations Research and the Management Sciences (INFORMS) in Mathematics of Operations Research

Vol. 2 (4) , 360-381
https://doi.org/10.1287/moor.2.4.360

Abstract

This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n → ∞ for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.

Keywords

BEHAVIOR
EXPECTED REWARD
DECISION PROBLEMS
MARKOV DECISION
VALUE ITERATION
UNDISCOUNTED VALUE
STEP
ASYMPTOTIC

This publication has 0 references indexed in Scilit: