Comparison of Expedient and Optima Reinforcement Schemes for Learning Systems

1 January 1972

journal article
research article
Published by Taylor & Francis in Journal of Cybernetics

Vol. 2 (1) , 21-37
https://doi.org/10.1080/01969727208548637

Abstract

Stochastic automata models have been successfully used in the past for modeling learning systems. An automaton model with a variable structure reacts to inputs from a random environment by changing the probabilities of its actions. These changes are carried out using a reinforcement scheme in such a manner that the automaton evolves to a final structure which is satisfactory in some sense. Several reinforcement schemes have been proposed in the literature for updating the structure of automata [1–4]. Most of these are expedient schemes which in the limit yield structures which are better than a device that chooses the actions with equal probabilities irrespective of the environment's response. A few schemes have also been suggested recently which in the limit lead to a continuous selection of a single optimal action as the output of the automaton, when it operates in a stationary environment and are called optimal schemes [5–7]. The question naturally arises as to which of the schemes are to be preferred in practical applications. In view of the anticipated extensive use of learning schemes in multilevel decision-making systems this question of optimality versus expediency takes on particular significance. Consequently, a comparison has to be made not merely of individual automata schemes but also of the effectiveness of such schemes in situations involving several automata (e.g. stochastic games, multilevel systems).

Keywords

This publication has 4 references indexed in Scilit:

A two-level system of stochastic automata for periodic random environments
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1971
Stochastic Automata Games
IEEE Transactions on Systems Science and Cybernetics, 1969
Use of Stochastic Automata for Parameter Self-Optimization with Multimodal Performance Criteria
IEEE Transactions on Systems Science and Cybernetics, 1969
On Expediency and Convergence in Variable-Structure Automata
IEEE Transactions on Systems Science and Cybernetics, 1968