On the Convergence of Policy Iteration in Finite State Undiscounted Markov Decision Processes: The Unichain Case
- 1 February 1987
- journal article
- Published by Institute for Operations Research and the Management Sciences (INFORMS) in Mathematics of Operations Research
- Vol. 12 (1) , 163-176
- https://doi.org/10.1287/moor.12.1.163
Abstract
We study the convergence of policy iteration for the undiscounted, finite state, discrete time Markov decision problem with compact action space and unichain transition structure. Using a “Newton Method type” representation for policy iteration, we establish the existence of a solution to the optimality equation. We show that to find an average optimal policy, it is sufficient to solve the optimality equation on the recurrent set of the maximizing policy. Under the additional assumption of a unique maximizing policy at each stage of the policy iteration procedure, we show that the iterates are convergent and the resulting policy is Blackwell optimal.Keywords
This publication has 0 references indexed in Scilit: