Estimation and control in discounted stochastic dynamic programming

1 January 1987

journal article
research article
Published by Taylor & Francis in Stochastics

Vol. 20 (1) , 51-71
https://doi.org/10.1080/17442508708833435

Abstract

The principle of estimation and control was introduced and studied independently by Kurano and Mandl under the average return criterion for models in which some of the data depend on an unknown parameter. Kurano and Mandl considered Markov decision models with finite state space and bounded rewards. Conditions are established for the existence of an optimal policy based on a consistent estimator for the unknown parameter which is optimal uniformly in the parameter. These results were extended by Kolonko to semi-Markov models with denumerable state space and unbounded rewards. The present paper considers the same principle of estimation and control for the discounted return criterion. The underlying semi-Markov decision model may have a denumerable state space and unbounded rewards. Conditions are established for the existence of a policy which is asymptotically discount optimal uniformly in the unknown parameter. The essential conditions are continuity and compactness conditions and a multiplicative form of the Foster criterion for positive recurrence of Markov chains formulated here for Markov decision models. An application to the control of an M|G|l-queue is discussed

Keywords

This publication has 9 references indexed in Scilit:

Adaptive control of Markov chains, I: Finite parameter set
IEEE Transactions on Automatic Control, 1979
Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion
Stochastic Processes and their Applications, 1979
A characterization of geometric ergodicity
Probability Theory and Related Fields, 1979
Wahrscheinlichkeitstheorie und Grundzüge der Maßtheorie
Published by Walter de Gruyter GmbH ,1978
Markov decision processes and strongly excessive functions
Stochastic Processes and their Applications, 1978
On Dynamic Programming with Unbounded Rewards
Management Science, 1975
Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal
Probability Theory and Related Fields, 1975
Estimation and control in Markov chains
Advances in Applied Probability, 1974
Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter
Published by Springer Nature ,1970