Finite-Time Performance of Some Two-Armed Bandit Controllers

1 March 1973

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Systems, Man, and Cybernetics

Vol. SMC-3 (2) , 194-197
https://doi.org/10.1109/tsmc.1973.5408504

Abstract

A class of asymptotically ε-optimal two-armed bandit controllers is given, and two criteria for comparing thelong-term finite-time performance of controllers in this class are proposed. The performances of three particular controllers are compared using the criteria, and the analysis is confirmed by computer iteration if the appropriate probability recurrence relations.

Keywords

This publication has 4 references indexed in Scilit:

A Note on the Linear Reinforcement Scheme for Variable-Structure Stochastic Automata
IEEE Transactions on Systems, Man, and Cybernetics, 1972
The two-armed-bandit problem with time-invariant finite memory
IEEE Transactions on Information Theory, 1970
Stochastic Computing Systems
Published by Springer Nature ,1969
Use of Stochastic Automata for Parameter Self-Optimization with Multimodal Performance Criteria
IEEE Transactions on Systems Science and Cybernetics, 1969