Punish/Reward: Learning with a Critic in Adaptive Threshold Systems
- 1 September 1973
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics
- Vol. SMC-3 (5) , 455-465
- https://doi.org/10.1109/tsmc.1973.4309272
Abstract
An adaptive threshold element is able to "learn" a strategy of play for the game blackjack (twenty-one) with a performance close to that of the Thorp optimal strategy although the adaptive system has no prior knowledge of the game and of the objective of play. After each winning game the decisions of the adaptive system are "rewarded." After each losing game the decisions are "punished." Reward is accomplished by adapting while accepting the actual decision as the desired response. Punishment is accomplished by adapting while taking the desired response to be the opposite of that of the actual decision. This learning scheme is unlike "learning with a teacher" and unlike "unsupervised learning." It involves "bootstrap adaptation" or "learning with a critic." The critic rewards decisions which are members of successful chains of decisions and punishes other decisions. A general analytical model for learning with a critic is formulated and analyzed. The model represents bootstrap learning per se. Although the hypotheses on which the model is based do not perfectly fit blackjack learning, it is applied heuristically to predict adaptation rates with good experimental success. New applications are being explored for bootstrap learning in adaptive controls and multilayered adaptive systems.Keywords
This publication has 26 references indexed in Scilit:
- On the advantages of the LMS spectrum analyzer over nonadaptive implementations of the sliding-DFTIEEE Transactions on Circuits and Systems I: Regular Papers, 1995
- Analysis of an Adaptive Threshold Logic UnitIEEE Transactions on Computers, 1970
- The Sum-Line Extrapolative Algorithm and Its Application to Statistical Classification ProblemsIEEE Transactions on Systems Science and Cybernetics, 1970
- A learning method for system identificationIEEE Transactions on Automatic Control, 1967
- Adaptive antenna systemsProceedings of the IEEE, 1967
- A trainable nonlinear function generatorIEEE Transactions on Automatic Control, 1966
- The use of an adaptive threshold element to design a linear optimal pattern classifierIEEE Transactions on Information Theory, 1966
- Design of quasi-optimal minimum-time controllersIEEE Transactions on Automatic Control, 1966
- A Critical Comparison of Two Kinds of Adaptive Classification NetworksIEEE Transactions on Electronic Computers, 1965
- Effects of Adaptation Parameters on Convergence Time and Tolerance for Adaptive Threshold ElementsIEEE Transactions on Electronic Computers, 1964