Markovian Sequential Replacement Processes

Open Access

1 December 1965

journal article
Published by Institute of Mathematical Statistics in The Annals of Mathematical Statistics

Vol. 36 (6) , 1677-1694
https://doi.org/10.1214/aoms/1177699796

Abstract

A sequential control process is a dynamic system which is observed periodically and classified into one of a number of possible states. After each observation one of a number of possible decisions is made. These decisions are the "control"; they determine the chance laws of the system. A replacement process is a control process with an additional special action, called replacement, which instantaneously returns the system to some initial state. Let $\mathbf{X}$ denote the state space of the system, assumed to be a Borel subset of finite dimensional Euclidean space. The case where $\mathbf{X}$ is finite has been treated by Derman [8], and thus $\mathbf{X}$ is considered infinite here. Let $\mathscr{B}$ be the $\sigma$-algebra of Borel sets in $\mathbf{X}$. Let $\{X_t; t = 0, 1, 2, \cdots\}$ be the sequence of states and $\{\Delta_t; t = 0, 1, 2, \cdots\}$ be the sequence of decisions. In a replacement problem it is assumed that there is a distinguished state $x_0 \epsilon \mathbf{X}$ with $X_0 = x_0$ with probability one. For any time $t$ let $S_t$ be the history of states and decisions up to and including time $t$. Let $\mathbf{A}$ be the set of possible actions, excluding replacement, where: $A1^\circ$ It is assumed that the action space $\mathbf{A}$ is a finite set with $n((\mathbf{A})$ elements. Since $\mathbf{A}$ is finite, assume $\mathbf{A} = \{1, 2, \cdots, n(\mathbf{A})\}$. Let $k_0 \not\varepsilon \mathbf{A}$ denote the replacement action. The action $k_0$ instantaneously returns the system to state $x_0$, and it may be followed by some action $k \varepsilon \mathbf{A}$ which "acts on" the state $x_0$. The pair $(k_0, k)$ itself constitutes a possible action. A decision at time $t$ is either a choice of an element $k \varepsilon \mathbf{A}$ or a choice of a pair $(k_0, k)$ with $k \varepsilon \mathbf{A}$. Let $\mathbf{A}_0$ be the total action space, where: $\mathbf{A}_0 = \mathbf{A} \cup \{ (k_0, k); k \varepsilon \mathbf{A}\}.$ There are $2n(\mathbf{A})$ elements in $\mathbf{A}_0$. Let $\Xi = \{\xi; \xi = \langle\xi_1, \cdots, \xi_{2n(\mathbf{A})}\rangle, \xi_j \geqq 0, \sum\xi_j = 1\}$ be the simplex of all probability distributions on $\mathbf{A}_0$. A sequential control rule is a function $D(s_{t - 1}, x) = \langle D_1(s_{t - 1}, x), \cdots, D_{2n(\mathbf{A})} (s_{t - 1}, x)\rangle$ of histories $s_{t - 1}$ and present states $x$ with values in $\Xi$. The interpretation is: At a history of $S_{t - 1} = s_{t - 1}$ and a present state $X_t = x$, decision $j \varepsilon \mathbf{A}_0$ is taken with probability $D_j(s_{t - 1}, x)$. In order that the integrals later to be written have meaning it is necessary to restrict attention to control rules $D(s_{t - 1}, x)$ which are Biare functions of their arguments. Let $\mathbf{R}$ be the space of all such control rules. A sequential control process is not specified until a "law of motion" is given. $A2^\circ$ It is assumed that for every $x \varepsilon \mathbf{X}$ and $k \varepsilon \mathbf{A}$ there exists a probability measure $Q(\cdot; x, k)$ on $\mathscr{B}$ such that for some version $\mathrm{Pr}\{X_{t + 1} \varepsilon B \mid S_{t - 1}, X_t = x, \Delta_t = k\} = Q(B; x, k)$; for every $B \varepsilon \mathscr{B}$ and history $S_{t - 1}$. For every $B \varepsilon \mathscr{B}$ and $k \varepsilon \mathbf{A}, Q(B; \cdot, k)$ is assumed to be a Baire function on $\mathbf{X}$. It is assumed that $Q(\cdot, x, k)$ is absolutely continuous with respect to some $\sigma$-finite measure $\mu$ on $\mathscr{B}$, and possessing a density $q(\cdot, x, k)$, also assumed to be a Baire function in $x$. Since $X_0 = x_0$ a.s., once a rule $R \varepsilon \mathbf{R}$ is specified, the sequences $\{X_t, t = 0, 1, 2, \cdots\}$ and $\{(X_t, \Delta_t); t = 0, 1, 2, \cdots\}$ are stochastic processes. The previous Assumption $A2^\circ$ imposes a structure similar to that of a Markov process in that the law of motion does not depend on the past history, but only on the present state. In a manner similar to Derman [9], the process $\{(X_t, \Delta_t); t = 0, 1, 2, \cdots\}$ will be called a Markovian sequential replacement process. It is not true that $\{X_t; t = 0, 1, \cdots\}$ nor even $\{(X_t, \Delta_t); t = 0, 1, \cdots\}$ will always be Markov processes; whether they are or not will depend on the rule $R$. Two assumptions particular to the development in this paper and insuring the ergodicity of the process are: $A3^\circ$ For every $x \varepsilon \mathbf{X}$ and $k \varepsilon \mathbf{A}$ it is assumed that $\lim_{x' \rightarrow x} \int |q(y; x, k) - q(y; x', k)| \mu(dy) = 0.$ $A4^\circ$ For every compact set $G \subset \mathbf{X}$ it is assumed that $\sup_{x \varepsilon G}\int_G q(y; x, k) \mu (dy) < 1$ for all $k \varepsilon \mathbf{A}$. The last assumption, $A4^\circ$, is stronger than needed, as may be seen in the examples in Section 4. However, it is easily verified and seems natural in many applications of the theory. Let $w(x, k)$ be the immediate cost whenever the system is in state $x \varepsilon \mathbf{X}$ and decision $k \varepsilon \mathbf{A}$ is made. It often occurs that the cost in an actual situation is a random variable whose distribution is determined by knowledge of the state and decision. In such a case, with some loss in generality, attention is restricted to $w(x, k)$ representing the expected one stage cost under the appropriate distribution. Let $K(x)$ be the cost of replacing a system in state $x$. If $w_0(\cdot, \cdot)$ is the cost function defined on $\mathbf{X} \times \mathbf{A}_0$ then the relationship is: $w_0(x, k) = w(x, k)\quad\text{for} k \neq k_0$ and $w_0(x, (k_0, k)) = K(x) + w(x_0, k)\quad\text{for} k \varepsilon \mathbf{A}.$ $A5^\circ$ Assume that $K(\cdot)$ is bounded and continuous with $0 \leqq K(x) \leqq M$ for all $x \varepsilon \mathbf{X}$. For every $k \varepsilon \mathbf{A}$ assume that $w(\cdot, k)$ is a non-negative continuous function on $\mathbf{X}$ with $\lim \inf_{x \rightarrow \infty} w(x, k)...

Keywords

This publication has 0 references indexed in Scilit: