Technical Note—On the Asymptotic Convergence Rate of Cost Differences for Markovian Decision Processes
- 1 February 1971
- journal article
- Published by Institute for Operations Research and the Management Sciences (INFORMS) in Operations Research
- Vol. 19 (1) , 244-248
- https://doi.org/10.1287/opre.19.1.244
Abstract
The modified method of successive approximations for solving Markovian decision problems as formulated by White, Schweitzer, MacQueen, and Odoni, concentrates attention on cost differences either between successive stages in the same state, or relative to a base state in the same stage, rather than on the total cost function. The former bound the (discounted) gain of the optimal policy, while the latter relative-cost function determines the policy to be chosen at each stage. While these authors have demonstrated that these modified constructs converge to the gain and the optimal relative-cost function under rather general circumstances (undiscounted, single-chain, aperiodic processes), little is known about the rates of convergence. [Note that convergence of the relative-cost function guarantees optimality of a currently repeating policy, as noted by Howard.) A great deal of insight into this mathematically difficult question may be gained by working out the actual asymptotic convergence rates of these constructs for the special case of a single fixed policy. This is an easy exercise via Howward's methods, but very suggestive, since the policy will be asymptotically constant for a well-behaved problem. (In particular, if there is a unique optimal policy it will eventually repeat.) Convergence for both constructs for the fixed-policy case is very powerful even for discount rates greater than 1.0, depending principally on the dominant eigenvalue of the transition matrix. This note discusses the intuitive implications of this fact for the relative efficiencies of modified value iteration, policy iteration, policy iteration via successive approximations, or possible hybrids.Keywords
This publication has 0 references indexed in Scilit: