Efficient Temporal Difference Learning with Adaptive λ

Bi Jinbo, Wu Cangpu. Efficient Temporal Difference Learning with Adaptive λ[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 1999, 8(3): 251-257.

Citation:

Bi Jinbo, Wu Cangpu. Efficient Temporal Difference Learning with Adaptive λ[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 1999, 8(3): 251-257.

Citation:

Bi Jinbo, Wu Cangpu. Efficient Temporal Difference Learning with Adaptive λ[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 1999, 8(3): 251-257.

Abstract

Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.

FullText(HTML)

Export File