Abstract
We consider a Markov renewal decision model with countable state space and an unbounded reward function. The transition probability depends on an unknown parameter θ. We give weak bounding and recurrence conditions under which the adaptation of the control according to the outcome of an estimator for θ yields an optimal procedure uniformly in θ with respect to the expected average reward criterion In a detailed example these results are applied to a controllable M/G/1 queueing model with unknown arrival rate and unknown service time distribution.