A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

1 January 2000

journal article
research article
Published by Cambridge University Press (CUP) in Probability in the Engineering and Informational Sciences

Vol. 14 (2) , 243-258
https://doi.org/10.1017/s0269964800142081

Abstract

A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker [4]. Almost sure convergence is proved under suitable conditions.

Keywords

DISCRETE TIME
STOCHASTIC
CONVERGENCE
BAKER
SCHEME
COMPACT
GOOD
TRANSITION
EUCLIDEAN

This publication has 0 references indexed in Scilit: