Abstract
An essential component of an intelligent agent is the ability to observe, encode, and use information about its environment. Traditional approaches to Genetic Programming have focused on evolving functional or reactive programs with only a minimal use of state. This paper presents an approach for investigating the evolution of learning, planning, and memory using Genetic Programming. The approach uses a multi-phasic fitness environment that enforces the use of memory and allows fairly straightforward comprehension of the evolved representations . An illustrative problem of 'gold' collection is used to demonstrate the usefulness of the approach. The results indicate that the approach can evolve programs that store simple representations of their environments and use these representations to produce simple plans. 1. Introduction and overview The ability to notice, store, and utilize information about the environment is an essential component of intelligent behavior. Storing a model or map of the environment allows an intelligent agent to plan its actions for maximal benefit and increases its problem solving capacity. Typically, the outcomes produced by evolutionary computation (EC) methods do not utilize this type of learning. Instead, the solutions created by genetic methods tend to be functional or reactive solutions that use state minimally, if at all. The 'learning' takes place on a population level, through the process of evolution, rather than on an individual level. The acquisition of mental models is not an entirely new area for EC. In the field of Genetic Algorithms (GA), neural network learning methods have often been combined with a GA that specifies the layout of the network or the initial weights (Belew et. al., 1991; Ackley and Littman, 1991). However, these methods do not use explicit representations of state -- the 'memories' learned are stored in the weights and are thus closed both to introspection and to human understanding. Additionally, the role that the neural net plays in the individual is often pre-specified, and thus the learned representations can only be used in pre-specified ways. In Genetic Programming (GP), the use of branching operators that depend upon the state of the environment is common (Koza, 1992; see Kinnear, 1994 for a review of recent GP techniques). When an evolved program combines several actions and branching statements through the use of 'progn' statements it incorporates an implicit use of state. However, this 'state' represents at best only an implicit representation of the world, is clearly not available for introspection, and often can be difficult to comprehend. Occasionally, GP applications allow the evolving programs to use a single variable of state (Andre, 1994; Koza, 1994), but this is hardly representational memory. One successful use of representational structures with GP is Teller's work (1994a) on the use of indexed memory. Teller evolved programs that could solve a simple problem -- that of pushing blocks up against the boundaries of a world, which he called Tartarus. Teller used a very interesting strategy to necessitate the use of memory in his evolving programs -- he strictly limited the function sets so that the evolved programs could move only once per evaluation and received only very limited sensory feedback. Without using memory, only minimal fitness was possible. Although Teller's indexed memory is a valuable addition to GP, indexed memory used by a program that is executed many times suffers two potential flaws. First, the representations of the world developed by the program are difficult to interpret. Second, despite Teller's proof that his indexed memory paradigm is Turing complete (Teller, 1994b), the fact that only a single action is performed by each execution of the evolved program requires an undesired high degree of external control. Rather than the externally forced iteration required by Teller's paradigm, the ideal system might include iteration in the function set so that the evolved programs naturally incorporate iteration. This paper presents a general methodology for the investigation of the evolution of learning, memory, and planning that addresses several of the difficulties encountered in previous work. Because the system uses a multi- phasic fitness environment in which the input and output processes of the individual programs are kept separate, the evolved representations of the world are easily available and comprehendable. In addition, this paper will present experimental results indicating that the system is successful at evolving individuals that can store representations of their world, and use these representations in the task of collecting 'gold'.

This publication has 0 references indexed in Scilit: