A framework for studying the neurobiology of value-based decision making

Abstract
Most behavioural and computational models of decision making assume that the following five processes are carried out at the time the decision is made: representation, action valuation, action selection, outcome valuation, and learning. On the basis of a sizeable body of animal and human behavioural evidence, several groups have proposed the existence of three different types of valuation systems: Pavlovian, habitual and goal-directed systems. Pavlovian systems assign value to only a small set of 'prepared' behaviours and thus have a limited behavioural repertoire. Nevertheless, they might be the driving force behind behaviours with important economic consequences (for example, overeating). Examples include preparatory behaviours, such as approaching a cue that predicts food, and consummatory behaviours, such as ingesting available food. Habit valuation systems learn to assign values to stimulus–response associations on the basis of previous experience through a process of trial-and-error. Examples of habits include a smoker's desire to have a cigarette at particular times of day (for example, after a meal) and a rat's tendency to forage in a cue-dependent location after sufficient training. Goal-directed systems assign values to actions by computing action–outcome associations and then evaluating the rewards that are associated with the different outcomes. An example of a goal-directed behaviour is the decision what to eat at a new restaurant. An important difference between habitual and goal-directed systems has to do with how they respond to changes in the environment. The goal-directed system updates the value of an action as soon as the value of its outcome changes, whereas the habit system only learns with repeated experience. The values computed by the three systems can be modulated by factors such as the risk that is associated with the decision, the time delay to the outcomes, and social considerations. The quality of the decisions made by an animal depend on how its brain assigns control to the different valuation systems in situations in which it has to make a choice between several potential actions that are assigned conflicting values. The learning properties of the habit system seem to be well-described by simple reinforcement algorithms, such as Q-learning. Some of the key computations that are predicted by these models are instantiated in the dopamine system.