Abstract
A class of asymptotically ε-optimal two-armed bandit controllers is given, and two criteria for comparing thelong-term finite-time performance of controllers in this class are proposed. The performances of three particular controllers are compared using the criteria, and the analysis is confirmed by computer iteration if the appropriate probability recurrence relations.

This publication has 4 references indexed in Scilit: