Adding Temporary Memory to ZCS

Abstract
In a recent article, Wilson (1994) described a "zeroth-level" classifier system (ZCS). ZCS employs a reinforcement learning technique comparable to Q-learning (Watkins, 1989). This article presents results from the first reconstruction of ZCS. Having replicated Wilson's results, we extend ZCS in a manner suggested by Wilson: The original formulation of ZCS has no memory mechanisms, but Wilson (1994b) suggested how internal "temporary memory" registers could be added. We show results from adding one-bit and two-bit memory registers to ZCS. Our results demonstrate that ZCS can exploit memory facilities efficiently in non-Markov environments. We also show that the memoryless ZCS can converge on near-optimal stochastic solutions in non-Markov environments. We then present results from trials using ZCS in Markov environments that require increasingly long chains of actions before reward is received. Our results indicate that inaccurate overgeneral classifiers can interact with the classifier-generation mechanisms to cause catastrophic breakdowns in overall system performance. Basing classifier fitness on accuracy may alleviate this problem. We conclude that the memory mechanism in its current form is unlikely to scale well for situations requiring large amounts of temporary memory. Nevertheless, the ability to find stochastic solutions when there is insufficient memory might offset this problem somewhat.

This publication has 15 references indexed in Scilit: