Abstract
For Markov chains controlled by a team of agents there is no generally applicable method for obtaining the optimal control policy if the delay in information sharing between the agents is more than one-step. the authors consider such a problem for a Markov chain whose transition probability matrix consists of blocks, with the coupling between the blocks being on the order of epsilon , where epsilon is a small parameter. It is shown that if each block is controlled by only one agent, then it is possible to obtain policies arbitrarily close to the optimal control policy by making use of the fact that the coupling between the blocks is weak. The authors present a complete set of results for the finite-horizon case and discuss possible extensions to the finite-horizon case.

This publication has 3 references indexed in Scilit: