Abstract
This paper deals with a nonstationary discrete-time Markov process whose transition probabilities vary periodically in time. Each transition results in a reward that varies within the same cycle as the transition matrix. For infinite processes a policy-iteration algorithm is developed that effectively determines an optimal policy maximizing the total discounted reward. The paper represents an extension of R. A. Howard's policy-iteration technique for stationary Markov processes. A numerical example is given in which the developed iteration algorithm is demonstrated.