Reward revision and the average reward markov decision process