The asymptotic optimality of discretized linear reward-inaction learning automata

Abstract
The automata considered have a variable structure and hence are completely described by action probability updating functions. The action probabilities can take only a finite number of prespecified values. These values linearly increase and the interval [0, 1] is divided into a number of equal length subintervals. The probability is updated by the automata only if the environment responds with a reward and hence they are called discretized linear reward-inaction automata. The asymptotic optimality of this family of automata is proved for all environments.

This publication has 0 references indexed in Scilit: