Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms