2 / 2025-03-04 14:44:57
Action Candidate-Driven Clipped Double Q-Learning in Continuous Control Tasks
Deep reinforcement learning, Clipped double Q-learning, Action candidate, Estimation bias
全文待审
浩辉 陈 / 中南大学自动化学院
翱翔 刘 / 中南大学自动化学院
    Deep reinforcement learning algorithms have been widely used in continuous control tasks. However, they frequently encounter challenges in accurately estimating Q-functions and suffer performance degradation in high-dimensional state-action spaces. To alleviate these problems, we propose TD6 (TD3 + 3 additions), a novel algorithm extending the twin delayed deep deterministic policy gradient (TD3) algorithm. TD6 introduces three critical enhancements. A shared set of action candidates enables two Q-networks to select optimal action independently. Building upon double Q-learning principles, TD6 cross-evaluates optimal actions between the two Q-networks, separating action selection and evaluation to improve stability and estimation precision. TD6 integrates action candidates with clipped double Q-learning, moderated by a regularization factor, to balance the overestimation bias inherent in deep deterministic policy gradients and the underestimation bias observed in TD3. By leveraging these enhancements, TD6 achieves superior performance in complex environments. Experimental results demonstrate that TD6 significantly reduces estimation bias and outperforms TD3 and DDPG in continuous control benchmarks, showing its superiority.
重要日期
  • 会议日期

    08月22日

    2025

    08月24日

    2025

  • 05月06日 2025

    初稿截稿日期

主办单位
中国自动化学会技术过程的故障诊断与安全性专业委员会
承办单位
新疆大学
新疆自动化学会
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询