Abstract:
To represent and store cost to go functions with more compact representations than lookup tables in scaling up average reward Markov decision processes, the state aggregation with relative value iteration algorithm was used to approximate the value function, the Span semi norm and the contraction mapping law were used to analyse the convergence of the algorithm. The Bellman equation for the state aggregation model was given. The convergence result was proved and an error bound for the proposed algorithm was presented under the condition of contraction with Span semi norm.