Estimation and control in discounted stochastic dynamic programming
- 1 January 1987
- journal article
- research article
- Published by Taylor & Francis in Stochastics
- Vol. 20 (1) , 51-71
- https://doi.org/10.1080/17442508708833435
Abstract
The principle of estimation and control was introduced and studied independently by Kurano and Mandl under the average return criterion for models in which some of the data depend on an unknown parameter. Kurano and Mandl considered Markov decision models with finite state space and bounded rewards. Conditions are established for the existence of an optimal policy based on a consistent estimator for the unknown parameter which is optimal uniformly in the parameter. These results were extended by Kolonko to semi-Markov models with denumerable state space and unbounded rewards. The present paper considers the same principle of estimation and control for the discounted return criterion. The underlying semi-Markov decision model may have a denumerable state space and unbounded rewards. Conditions are established for the existence of a policy which is asymptotically discount optimal uniformly in the unknown parameter. The essential conditions are continuity and compactness conditions and a multiplicative form of the Foster criterion for positive recurrence of Markov chains formulated here for Markov decision models. An application to the control of an M|G|l-queue is discussedKeywords
This publication has 9 references indexed in Scilit:
- Adaptive control of Markov chains, I: Finite parameter setIEEE Transactions on Automatic Control, 1979
- Denumerable state semi-Markov decision processes with unbounded costs, average cost criterionStochastic Processes and their Applications, 1979
- A characterization of geometric ergodicityProbability Theory and Related Fields, 1979
- Wahrscheinlichkeitstheorie und Grundzüge der MaßtheoriePublished by Walter de Gruyter GmbH ,1978
- Markov decision processes and strongly excessive functionsStochastic Processes and their Applications, 1978
- On Dynamic Programming with Unbounded RewardsManagement Science, 1975
- Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimalProbability Theory and Related Fields, 1975
- Estimation and control in Markov chainsAdvances in Applied Probability, 1974
- Foundations of Non-stationary Dynamic Programming with Discrete Time ParameterPublished by Springer Nature ,1970