Humans Can Adopt Optimal Discounting Strategy under Real-Time Constraints

Abstract
Critical to our many daily choices between larger delayed rewards, and smaller more immediate rewards, are the shape and the steepness of the function that discounts rewards with time. Although research in artificial intelligence favors exponential discounting in uncertain environments, studies with humans and animals have consistently shown hyperbolic discounting. We investigated how humans perform in a reward decision task with temporal constraints, in which each choice affects the time remaining for later trials, and in which the delays vary at each trial. We demonstrated that most of our subjects adopted exponential discounting in this experiment. Further, we confirmed analytically that exponential discounting, with a decay rate comparable to that used by our subjects, maximized the total reward gain in our task. Our results suggest that the particular shape and steepness of temporal discounting is determined by the task that the subject is facing, and question the notion of hyperbolic reward discounting as a universal principle. When we make a choice between two options, we compare the values of their outcomes and select the option with a larger value. However, what if one option leads to a larger delayed reward and the other leads to a smaller more immediate reward? Naturally, we assign a larger value for a larger reward, but it is “discounted” if the reward is to be delivered later. Thus, the value is a monotonically decreasing function of the delays. Previous behavioral studies have repeatedly demonstrated that humans and animals discount delayed rewards hyperbolically. This has practical importance, as hyperbolic discounting can sometimes lead to “irrational” preference reversal: for instance, an individual may prefer two apples in 51 days to one apple in 50 days, but if the days come closer, he prefers one apple today to two apples tomorrow. On the contrary, exponential discounting is always “rational,” as it predicts constant preference. Here, in a new task that mimics animal foraging, and that uses delayed monetary rewards, Schweighofer and colleagues showed that humans can also discount reward exponentially. Furthermore, it is remarkable that by adopting exponential discounting, their subjects maximized their total gain. Thus, depending on the task at hand, the authors' study suggests that humans can flexibly choose the type of reward discounting, and can exhibit rational behavior that maximizes long-term gains.