强化学习 reward 曲线的绘制

每隔一段取一个均值，然后把均值曲线绘制出来，包含全部点的曲线淡化处理
摘自 Z. Mou, Y. Zhang, F. Gao, H. Wang, T. Zhang and Z. Han, “Deep Reinforcement Learning based Three-Dimensional Area Coverage with UAV Swarm,” in IEEE Journal on Selected Areas in Communications, doi: 10.1109/JSAC.2021.3088718.

说明文字:
Fig. 9 shows the rewards of SDQN, the variants of SDQN and other RL algorithms during the training process. The number of training episodes is set to be 800 with 200,000 steps each. Note that SDQN-nC represents the SDQN algorithm with no CNN in observation history model, and SDQN-nD is the SDQN algorithm with no panel divisions of terrain Q in advance. From Fig. 9, we can see that the rewards of SDQN rise much more quickly than that of the other four algorithms. The final rewards of SDQN-nC are less than that of SDQN, which indicates that the CNN in observation history model correctly extracts the features of coverage information of each LUAV and its neighbors. Moreover, the rewards of SDQN-nD rise slower than that of both SDQN and SDQN-nC, which indicates that the panel divisions based on prior knowledge play an important part in the performance improvement. From the high vibrating rewards curve of SDQN-nD, we can see that the panel divisions will reduce the performance variance of LUAVs by increasing the disciplines of patch selections for LUAVs. Furthermore, SDQN has better performance than both Actor Critic and REINFORCE algorithms. The rewards of Actor Critic have lower variance than the rewards of REINFORCE, because Actor Critic algorithm uses an extra critic network to guide the improvement directions of policies.

只绘制出了每一小段的均值，没有体现波动
摘自 Ding R, Xu Y, Gao F, et al. Trajectory Design and Access Control for Air-Ground Coordinated Communications System with Multi-Agent Deep Reinforcement Learning[J]. IEEE Internet of Things Journal, 2021.

摘自 Liu X, Liu Y, Chen Y, et al. Machine learning aided trajectory design and power control of multi-UAV[C]//2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019: 1-6.

强化学习 reward 曲线的绘制相关推荐

Amazon DeepRacer训练日志分析范例与强化学习Reward Function设计
Amazon DeepRacer 是一款专门为帮助开发人员进行强化学习(Reinforcement learning)实践而开发的1/18 比例的完全自动驾驶赛车.Amazon DeepRacer 为 ...
机器学习-55-RL-07-Sparse Reward(强化学习-稀疏奖励:Reward Shaping,Curriculum Learning,Hierarchical RL)
文章目录 Sparse Reward Reward Shaping(奖励塑造) Reward Shaping Curiosity(ICM) Curriculum Learning Curriculum ...
1.5万字详述 | 全开源：python写小游戏+AI强化学习与传统DFS/BFS控制分别实现
简介:本周的强化学习我们来到实践部分.我以我在 GitHub 上开源的项目 PiperLiu / Amazing-Brick-DFS-and-DRL 为对象,从零开始与各位朋友分享:如何用 pytho ...
再谈注意力机制 | 运用强化学习实现目标特征提取
论文题目:Recurrent Models of Visual Attention 论文链接:http://www.oalib.com/paper/4082117 作者及单位研究目标研究如何减少图 ...
初探强化学习(10)强化学习中的一些术语（non-stationray，sample efficiency，planning和Learnin，Reward，off-policy和on-policy ）
1. 关于stationray 参考博客. Stationary or not 根据环境是否稳定.可以将强化学习问题分为stationary.non-stationary. 1.1 stationar ...
深度强化学习中带有阴影的曲线是怎么画的？
记录学习一下: 1.强化学习论文里的训练曲线是用什么画的?如何计算相关变量 - 深度强化学习实验室 2. 论文中画带标准差阴影的曲线图:seaborn.lineplot()_条件反射104的博客-CS ...
人工智能-强化学习：Imitation Learning（模仿学习）= Learning by Demonstration（示范学习）【每一步action都没有reward】
Imitation Learning(模仿学习)是从给定的展示中进行学习.机器在这个过程中,也和环境进行交互,但是,并没有显示的得到 reward. 在某些任务上,也很难定义 reward.如:自动驾 ...
强化学习论文——Policy invariance under reward transformations: Theory and application to reward shaping
Policy invariance under reward transformations: Theory and application to reward shaping 这篇文章是奖励塑造的重 ...
影像组学视频学习笔记(15)-ROC曲线及其绘制、Li‘s have a solution and plan.
本笔记来源于B站Up主: 有Li 的影像组学系列教学视频本节(15)主要介绍: ROC曲线及其绘制 ROC 曲线 ROC = receiver operating characteristic cu ...

强化学习 reward 曲线的绘制

强化学习 reward 曲线的绘制相关推荐

最新文章

热门文章