Double Deep Q Network with Adaptive Prioritized Experience Replay

Adibian, Majid; Ebadzadeh, Mohammad Mahdi

doi:10.22060/miscj.2025.23426.5373

دانشگاه صنعتی امیرکبیر

تعداد نشریات	8
تعداد شماره‌ها	438
تعداد مقالات	5,652
تعداد مشاهده مقاله	7,693,740
تعداد دریافت فایل اصل مقاله	6,343,951

	Double Deep Q Network with Adaptive Prioritized Experience Replay
AUT Journal of Modeling and Simulation
دوره 57، شماره 1، شهریور 2025، صفحه 53-62 اصل مقاله (1.01 M)
نوع مقاله: Research Article
شناسه دیجیتال (DOI): 10.22060/miscj.2025.23426.5373
نویسندگان
Majid Adibian؛ Mohammad Mahdi Ebadzadeh^*
Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran
چکیده
In deep reinforcement learning, experience replay buffers help break the correlation of sequential data and improve the efficiency of learning from past experiences. Prioritized Experience Replay (PER) enhances this process by selecting transitions based on their temporal difference (TD) error. However, PER does not account for how often a transition has been used or its overall importance. To address this, we introduce an adaptive prioritization method that incorporates three additional transition-level factors: reward, usage count (counter), and policy probability—collectively termed RCP values. Each RCP value is normalized and combined with the TD error to determine the selection probability of transitions from the replay buffer. We test our approach on several Atari game environments and find that using any of the RCP values individually improves performance over standard PER. To leverage all three RCP components, we evaluate three aggregation strategies: taking the minimum, maximum, or mean of the RCP values. Results indicate that while the best aggregation method varies by environment, the mean function consistently delivers stable performance improvements. This is likely because it balances the influence of all three factors, preventing over-reliance on any single one. Our findings indicate that incorporating RCP values provides a straightforward and effective improvement over conventional prioritization methods in experience replay.
کلیدواژه‌ها
Deep reinforcement learning؛ Prioritized Experience Replay؛ Deep Q-Network

آمار تعداد مشاهده مقاله: 361 تعداد دریافت فایل اصل مقاله: 274

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

آمار

Double Deep Q Network with Adaptive Prioritized Experience Replay