Distributional Soft Actor-Critic with Adaptive Entropy Regularization: An Extended Theoretical Analysis

Fozi, Meysam; Ebadzadeh, Mohammad Mehdi

doi:10.22060/miscj.2026.23574.5387

	Distributional Soft Actor-Critic with Adaptive Entropy Regularization: An Extended Theoretical Analysis
AUT Journal of Modeling and Simulation
دوره 57، شماره 2، اسفند 2025، صفحه 215-228 اصل مقاله (1.34 M)
نوع مقاله: Research Article
شناسه دیجیتال (DOI): 10.22060/miscj.2026.23574.5387
نویسندگان
Meysam Fozi؛ Mohammad Mehdi Ebadzadeh^*
Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran
چکیده
In reinforcement learning, one of the key challenges is the overestimation of the Q-value function, which can negatively impact policy performance. This paper introduces an adaptive extension of the distributional soft actor-critic (DSAC) algorithm, designed specifically for continuous control tasks, with the goal of mitigating Q-value overestimation. In addition, the proposed approach addresses the exploration-exploitation trade-off by taking into account model variance. We develop and evaluate four distinct versions of this adaptive extension, each incorporating different entropy regularization techniques: linearly decaying, exponentially decaying, linear adaptive, and exponentially adaptive regularization. These regularization methods are applied during the training process to balance exploration and exploitation more effectively. Our experimental results, conducted on OpenAI’s MuJoCo humanoid control tasks, demonstrate that the exponentially adaptive entropy regularization version of the DSAC algorithm performs significantly better than both the baseline method and the other proposed extensions. This performance improvement highlights the importance of adaptive entropy regularization strategies in reinforcement learning, particularly for tasks requiring fine-tuned control in continuous environments. The findings suggest that the proposed adaptive DSAC algorithm not only enhances learning stability by reducing overestimation but also offers a more efficient solution to the exploration-exploitation dilemma, providing a promising direction for future research in reinforcement learning for continuous control settings.
کلیدواژه‌ها
Reinforcement Learning؛ Continuous Control؛ Distributional Soft Actor-Critic (DSAC)؛ Entropy Regularization؛ Q-Value Overestimation
آمار تعداد مشاهده مقاله: 133 تعداد دریافت فایل اصل مقاله: 40

پیوندهای مفید

دانشگاه صنعتی امیرکبیر

آمار

تعداد نشریات	9
تعداد شماره‌ها	462
تعداد مقالات	5,800
تعداد مشاهده مقاله	8,585,650
تعداد دریافت فایل اصل مقاله	7,124,907