Abstract
This paper deals with the optimal on-line control of a stationary repetitive single-stage discrete state Markov process whose statistical description is initially unknown. In the presence of uncertainty concerning the optimal control policy, the controller must perform the dual functions of estimation and control, Evidently it is necessary in the course of learning the process parameters to probe the system with inputs which appear in retrospect to be non-optimal from the control standpoint. The algorithm which orders these non-optimal decisions is termed a strategy. The ideal dual strategy is defined as that which extracts the maximum information from the process for a given cost, the term ‘information’ here denoting a decrease in error probability. An examination of the foregoing definition reveals the close relationship between the estimation procedure and the system cost function. It is shown that this relationship can be expressed as a simple equality constraint which is readily implemented as an on-line control strategy. Numerical results are presented which confirm the effectiveness of the strategy.

This publication has 3 references indexed in Scilit: