Markov Decision Problems and State-Action Frequencies
- 1 July 1991
- journal article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Control and Optimization
- Vol. 29 (4) , 786-809
- https://doi.org/10.1137/0329043
Abstract
Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each state-action pair, describing for each state the relative number of uses of each action. These "conditional frequencies," which are defined pathwise, are shown to determine the "state-action frequencies" that, in the finite case, are known to determine the costs. This is extended to the countable case, allowing for unbounded costs. The space of frequencies is shown to be compact and convex, and the extreme points are identified with stationary deterministic policies. Conditions under which the search for optimality in several optimization problems may be restricted to stationary policies are given. These problems include the standard Markov decision process, as well as constrained optimization (both in terms of average cost functionals) and variability-sensitive optimization. An application to a queueing problem is given, where these results imply the existence and explicit computation of optimal policies in constrained optimization problems. The pathwise definition of the conditional frequencies implies that their values can be controlled directly; moreover, they depend only on the limiting behavior of the control. This has immediate application to adaptive control of Markov chains, including adaptive control under constraints.Keywords
This publication has 17 references indexed in Scilit:
- Existence of optimal stationary policies in average reward Markov decision processes with a recurrent stateApplied Mathematics & Optimization, 1992
- Adaptive control of constrained Markov chainsIEEE Transactions on Automatic Control, 1991
- Optimal priority assignment: a time sharing approachIEEE Transactions on Automatic Control, 1989
- Variance-Penalized Markov Decision ProcessesMathematics of Operations Research, 1989
- Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded RewardsMathematics of Operations Research, 1988
- Control of Markov Chains with Long-Run Average Cost CriterionPublished by Springer Nature ,1988
- K competing queues with geometric service requirements and linear costs: The μc-rule is always optimalSystems & Control Letters, 1985
- The cμ rule revisitedAdvances in Applied Probability, 1985
- On Minimum Cost Per Unit Time Control of Markov ChainsSIAM Journal on Control and Optimization, 1984
- An Example in Denumerable Decision ProcessesThe Annals of Mathematical Statistics, 1968