Comparison of Expedient and Optima Reinforcement Schemes for Learning Systems
- 1 January 1972
- journal article
- research article
- Published by Taylor & Francis in Journal of Cybernetics
- Vol. 2 (1) , 21-37
- https://doi.org/10.1080/01969727208548637
Abstract
Stochastic automata models have been successfully used in the past for modeling learning systems. An automaton model with a variable structure reacts to inputs from a random environment by changing the probabilities of its actions. These changes are carried out using a reinforcement scheme in such a manner that the automaton evolves to a final structure which is satisfactory in some sense. Several reinforcement schemes have been proposed in the literature for updating the structure of automata [1–4]. Most of these are expedient schemes which in the limit yield structures which are better than a device that chooses the actions with equal probabilities irrespective of the environment's response. A few schemes have also been suggested recently which in the limit lead to a continuous selection of a single optimal action as the output of the automaton, when it operates in a stationary environment and are called optimal schemes [5–7]. The question naturally arises as to which of the schemes are to be preferred in practical applications. In view of the anticipated extensive use of learning schemes in multilevel decision-making systems this question of optimality versus expediency takes on particular significance. Consequently, a comparison has to be made not merely of individual automata schemes but also of the effectiveness of such schemes in situations involving several automata (e.g. stochastic games, multilevel systems).Keywords
This publication has 4 references indexed in Scilit:
- A two-level system of stochastic automata for periodic random environmentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1971
- Stochastic Automata GamesIEEE Transactions on Systems Science and Cybernetics, 1969
- Use of Stochastic Automata for Parameter Self-Optimization with Multimodal Performance CriteriaIEEE Transactions on Systems Science and Cybernetics, 1969
- On Expediency and Convergence in Variable-Structure AutomataIEEE Transactions on Systems Science and Cybernetics, 1968