基于状态集结的值函数逼近

Value Function Approximation with State Aggregation

摘要: 用更为紧凑的方法表示和存贮值函数，以求解大规模平均模型Ｍａｒｋｏｖ决策规划（ＭＤＰ）问题。通过状态集结相对值迭代算法逼近值函数，用Ｓｐａｎ半范数和压缩映原理分析算法的收敛性。给出了状态集结后的Ｂｅｌｌｍａｎ最优方程。在Ｓｐａｎ压缩条件下了该算法的收敛性，同时还给出了其误差估计。

Abstract: To represent and store cost to go functions with more compact representations than lookup tables in scaling up average reward Markov decision processes, the state aggregation with relative value iteration algorithm was used to approximate the value function, the Span semi norm and the contraction mapping law were used to analyse the convergence of the algorithm. The Bellman equation for the state aggregation model was given. The convergence result was proved and an error bound for the proposed algorithm was presented under the condition of contraction with Span semi norm.