Ordinal Dynamic Programming

Abstract
Numerically valued reward processes are found in most dynamic programming models. Mitten, however, recently formulated finite horizon sequential decision processes in which a real-valued reward need not be earned at each stage. Instead of the cardinality assumption implicit in past models, Mitten assumes that a decision maker has a preference order over a general collection of outcomes (which need not be numerically valued). This paper investigates infinite horizon ordinal dynamic programming models. Both deterministic and stochastic models are considered. It is shown that an optimal policy exists if and only if some stationary policy is optimal. Moreover, "policy improvement" leads to better policies using either Howard-Blackwell or Eaton-Zadeh procedures. The results illuminate the roles played by various sets of assumptions in the literature on Markovian decision processes.

This publication has 0 references indexed in Scilit: