Markov Decision Problems and State-Action Frequencies

1 July 1991

journal article
Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Journal on Control and Optimization

Vol. 29 (4) , 786-809
https://doi.org/10.1137/0329043

Abstract

Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each state-action pair, describing for each state the relative number of uses of each action. These "conditional frequencies," which are defined pathwise, are shown to determine the "state-action frequencies" that, in the finite case, are known to determine the costs. This is extended to the countable case, allowing for unbounded costs. The space of frequencies is shown to be compact and convex, and the extreme points are identified with stationary deterministic policies. Conditions under which the search for optimality in several optimization problems may be restricted to stationary policies are given. These problems include the standard Markov decision process, as well as constrained optimization (both in terms of average cost functionals) and variability-sensitive optimization. An application to a queueing problem is given, where these results imply the existence and explicit computation of optimal policies in constrained optimization problems. The pathwise definition of the conditional frequencies implies that their values can be controlled directly; moreover, they depend only on the limiting behavior of the control. This has immediate application to adaptive control of Markov chains, including adaptive control under constraints.

Keywords

This publication has 17 references indexed in Scilit:

Existence of optimal stationary policies in average reward Markov decision processes with a recurrent state
Applied Mathematics & Optimization, 1992
Adaptive control of constrained Markov chains
IEEE Transactions on Automatic Control, 1991
Optimal priority assignment: a time sharing approach
IEEE Transactions on Automatic Control, 1989
Variance-Penalized Markov Decision Processes
Mathematics of Operations Research, 1989
Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards
Mathematics of Operations Research, 1988
Control of Markov Chains with Long-Run Average Cost Criterion
Published by Springer Nature ,1988
K competing queues with geometric service requirements and linear costs: The μc-rule is always optimal
Systems & Control Letters, 1985
The cμ rule revisited
Advances in Applied Probability, 1985
On Minimum Cost Per Unit Time Control of Markov Chains
SIAM Journal on Control and Optimization, 1984
An Example in Denumerable Decision Processes
The Annals of Mathematical Statistics, 1968