Abstract
A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker [4]. Almost sure convergence is proved under suitable conditions.

This publication has 0 references indexed in Scilit: