A Hierarchical Network of Provably Optimal Learning Control Systems: Extensions of the Associative Control Process (ACP) Network
- 1 January 1993
- journal article
- Published by SAGE Publications in Adaptive Behavior
- Vol. 1 (3) , 321-352
- https://doi.org/10.1177/105971239300100303
Abstract
An associative control process (ACP) network is a learning control system that can reproduce a variety of animal learning results from classical and instrumental conditioning experiments (Klopf, Morgan, & Weaver, 1993; see also the article, 'A Hierarchical Network of Control Systems that Learn"). The ACP networks proposed and tested by Klopf, Morgan, and Weaver are not guaranteed, however, to learn optimal policies for maximizing reinforcement. Optimal behavior is guaranteed for a reinforcement learning system such as Q-learning (Watkins, 1989), but simple Q-learning is incapable of reproducing the animal learning results that ACP networks reproduce. We propose two new models that reproduce the animal learning results and are provably optimal. The first model, the modified ACP network, embodies the smallest number of changes necessary to the ACP network to guarantee that optimal policies will be learned while still reproducing the animal learning results. The second model, the single-layer ACP network, embodies the smallest number of changes necessary to Q-learning to guarantee that it reproduces the animal learning results while still learning optimal policies. We also propose a hierarchical network architecture within which several reinforcement learning systems (e.g., Q-learning systems, single-layer ACP networks, or any other learning controller) can be combined in a hierarchy. We implement the hierarchical network architecture by combining four of the single-layer ACP networks to form a controller for a standard inverted pendulum dynamic control problem. The hierarchical controller is shown to learn more reliably and more than an order of magnitude faster than either the single-layer ACP network or the Barto, Sutton, and Anderson (1983) learning controller for the benchmark problem.Keywords
This publication has 17 references indexed in Scilit:
- The convergence of TD(?) for general ?Machine Learning, 1992
- New Approaches to RoboticsScience, 1991
- Drive-reinforcement learning: a self-supervised model for adaptive controlNetwork: Computation in Neural Systems, 1990
- A stochastic reinforcement learning algorithm for learning real-valued functionsNeural Networks, 1990
- A robust layered control system for a mobile robotIEEE Journal on Robotics and Automation, 1986
- Neuronlike adaptive elements that can solve difficult learning control problemsIEEE Transactions on Systems, Man, and Cybernetics, 1983
- Toward a modern theory of adaptive networks: Expectation and prediction.Psychological Review, 1981
- Learning Automata - A SurveyIEEE Transactions on Systems, Man, and Cybernetics, 1974
- Some Studies in Machine Learning Using the Game of Checkers. II—Recent ProgressIBM Journal of Research and Development, 1967
- Some Studies in Machine Learning Using the Game of CheckersIBM Journal of Research and Development, 1959