Distributional Soft Actor-Critic with Adaptive Entropy Regularization: An Extended Theoretical Analysis | ||
| AUT Journal of Modeling and Simulation | ||
| دوره 57، شماره 2، اسفند 2025، صفحه 215-228 اصل مقاله (1.34 M) | ||
| نوع مقاله: Research Article | ||
| شناسه دیجیتال (DOI): 10.22060/miscj.2026.23574.5387 | ||
| نویسندگان | ||
| Meysam Fozi؛ Mohammad Mehdi Ebadzadeh* | ||
| Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran | ||
| چکیده | ||
| In reinforcement learning, one of the key challenges is the overestimation of the Q-value function, which can negatively impact policy performance. This paper introduces an adaptive extension of the distributional soft actor-critic (DSAC) algorithm, designed specifically for continuous control tasks, with the goal of mitigating Q-value overestimation. In addition, the proposed approach addresses the exploration-exploitation trade-off by taking into account model variance. We develop and evaluate four distinct versions of this adaptive extension, each incorporating different entropy regularization techniques: linearly decaying, exponentially decaying, linear adaptive, and exponentially adaptive regularization. These regularization methods are applied during the training process to balance exploration and exploitation more effectively. Our experimental results, conducted on OpenAI’s MuJoCo humanoid control tasks, demonstrate that the exponentially adaptive entropy regularization version of the DSAC algorithm performs significantly better than both the baseline method and the other proposed extensions. This performance improvement highlights the importance of adaptive entropy regularization strategies in reinforcement learning, particularly for tasks requiring fine-tuned control in continuous environments. The findings suggest that the proposed adaptive DSAC algorithm not only enhances learning stability by reducing overestimation but also offers a more efficient solution to the exploration-exploitation dilemma, providing a promising direction for future research in reinforcement learning for continuous control settings. | ||
| کلیدواژهها | ||
| Reinforcement Learning؛ Continuous Control؛ Distributional Soft Actor-Critic (DSAC)؛ Entropy Regularization؛ Q-Value Overestimation | ||
|
آمار تعداد مشاهده مقاله: 133 تعداد دریافت فایل اصل مقاله: 40 |
||
| تعداد نشریات | 9 |
| تعداد شمارهها | 462 |
| تعداد مقالات | 5,800 |
| تعداد مشاهده مقاله | 8,585,650 |
| تعداد دریافت فایل اصل مقاله | 7,124,907 |