A Reward Scheme for Production Systems with Overlapping Conflict Sets

1 May 1986

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics

Vol. 16 (3) , 369-383
https://doi.org/10.1109/tsmc.1986.4308969

Abstract

The reward-allocation problem for production systems in delayed-payoff situations is formalized in a conceptual model in which the environment of the system is a finite automaton. The environment state and the state of the system's local memory determine which productions are in the current conflict set. Productions are selected from the conflict set with probabilities proportional to their activations. Each selected production updates the local memory and furnishes the next input symbol to the environment. A reward scheme examines the payoff that is output by the environment and adjusts the activations in an attempt to increase average payoff per unit time. A reward scheme is safe if it is generally biased towards improvement. The notion of reward scheme safety is formalized, an asymptotically safe reward scheme is exhibited, and its safety is demonstrated. The demonstration is an analog of the proof of Fisher's fundamental theorem of natural selection.

Keywords

This publication has 6 references indexed in Scilit:

The absolutely expedient nonlinear reinforcement schemes under the unknown multiteacher environment
IEEE Transactions on Systems, Man, and Cybernetics, 1983
An adaptive optimal controller for discrete-time Markov environments
Information and Control, 1977
Random environments and automata
Information Sciences, 1975
Learning Automata - A Survey
IEEE Transactions on Systems, Man, and Cybernetics, 1974
An Application of Fisher's Theorem on Natural Selection to Some Re-enforcement Algorithms for Choice Strategies
Journal of Cybernetics, 1974
Formulation of learning automata and automata games
Information Sciences, 1969