每隔一段取一个均值,然后把均值曲线绘制出来,包含全部点的曲线淡化处理
摘自 Z. Mou, Y. Zhang, F. Gao, H. Wang, T. Zhang and Z. Han, “Deep Reinforcement Learning based Three-Dimensional Area Coverage with UAV Swarm,” in IEEE Journal on Selected Areas in Communications, doi: 10.1109/JSAC.2021.3088718.

说明文字:
Fig. 9 shows the rewards of SDQN, the variants of SDQN and other RL algorithms during the training process. The number of training episodes is set to be 800 with 200,000 steps each. Note that SDQN-nC represents the SDQN algorithm with no CNN in observation history model, and SDQN-nD is the SDQN algorithm with no panel divisions of terrain Q in advance. From Fig. 9, we can see that the rewards of SDQN rise much more quickly than that of the other four algorithms. The final rewards of SDQN-nC are less than that of SDQN, which indicates that the CNN in observation history model correctly extracts the features of coverage information of each LUAV and its neighbors. Moreover, the rewards of SDQN-nD rise slower than that of both SDQN and SDQN-nC, which indicates that the panel divisions based on prior knowledge play an important part in the performance improvement. From the high vibrating rewards curve of SDQN-nD, we can see that the panel divisions will reduce the performance variance of LUAVs by increasing the disciplines of patch selections for LUAVs. Furthermore, SDQN has better performance than both Actor Critic and REINFORCE algorithms. The rewards of Actor Critic have lower variance than the rewards of REINFORCE, because Actor Critic algorithm uses an extra critic network to guide the improvement directions of policies.


只绘制出了每一小段的均值,没有体现波动
摘自 Ding R, Xu Y, Gao F, et al. Trajectory Design and Access Control for Air-Ground Coordinated Communications System with Multi-Agent Deep Reinforcement Learning[J]. IEEE Internet of Things Journal, 2021.


摘自 Liu X, Liu Y, Chen Y, et al. Machine learning aided trajectory design and power control of multi-UAV[C]//2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019: 1-6.

强化学习 reward 曲线的绘制相关推荐

  1. Amazon DeepRacer训练日志分析范例与强化学习Reward Function设计

    Amazon DeepRacer 是一款专门为帮助开发人员进行强化学习(Reinforcement learning)实践而开发的1/18 比例的完全自动驾驶赛车.Amazon DeepRacer 为 ...

  2. 机器学习-55-RL-07-Sparse Reward(强化学习-稀疏奖励:Reward Shaping,Curriculum Learning,Hierarchical RL)

    文章目录 Sparse Reward Reward Shaping(奖励塑造) Reward Shaping Curiosity(ICM) Curriculum Learning Curriculum ...

  3. 1.5万字详述 | 全开源:python写小游戏+AI强化学习与传统DFS/BFS控制分别实现

    简介:本周的强化学习我们来到实践部分.我以我在 GitHub 上开源的项目 PiperLiu / Amazing-Brick-DFS-and-DRL 为对象,从零开始与各位朋友分享:如何用 pytho ...

  4. 再谈注意力机制 | 运用强化学习实现目标特征提取

    论文题目:Recurrent Models of Visual Attention 论文链接:http://www.oalib.com/paper/4082117 作者及单位 研究目标 研究如何减少图 ...

  5. 初探强化学习(10)强化学习中的一些术语(non-stationray,sample efficiency,planning和Learnin,Reward,off-policy和on-policy )

    1. 关于stationray 参考博客. Stationary or not 根据环境是否稳定.可以将强化学习问题分为stationary.non-stationary. 1.1 stationar ...

  6. 深度强化学习中带有阴影的曲线是怎么画的?

    记录学习一下: 1.强化学习论文里的训练曲线是用什么画的?如何计算相关变量 - 深度强化学习实验室 2. 论文中画带标准差阴影的曲线图:seaborn.lineplot()_条件反射104的博客-CS ...

  7. 人工智能-强化学习:Imitation Learning(模仿学习)= Learning by Demonstration(示范学习)【每一步action都没有reward】

    Imitation Learning(模仿学习)是从给定的展示中进行学习.机器在这个过程中,也和环境进行交互,但是,并没有显示的得到 reward. 在某些任务上,也很难定义 reward.如:自动驾 ...

  8. 强化学习论文——Policy invariance under reward transformations: Theory and application to reward shaping

    Policy invariance under reward transformations: Theory and application to reward shaping 这篇文章是奖励塑造的重 ...

  9. 影像组学视频学习笔记(15)-ROC曲线及其绘制、Li‘s have a solution and plan.

    本笔记来源于B站Up主: 有Li 的影像组学系列教学视频 本节(15)主要介绍: ROC曲线及其绘制 ROC 曲线 ROC = receiver operating characteristic cu ...

最新文章

  1. Android -- Fragment注意事项
  2. 汇编语言随笔(10)-内中断及实验12(返回到dos的中断处理程序)
  3. 什么是原码、反码和补码?
  4. ListView 设置高度为刚好能包裹子元素
  5. 四、分析HelloWorld程序,开始学习Java运算符
  6. org.apache.poi 读取数字问题
  7. 历害了!教你自己搭建一个私人网盘..
  8. matlab中matconvnet,MATLAB2017a编译MatConvNet过程中出现的各种问题
  9. python学习笔记——取矩阵的上三角或下三角元素
  10. 如何在文件夹中打开cmd命令窗
  11. 怎么挑小红书koc?什么是小红书koc
  12. IDEA 解决插件页面转圈问题
  13. 常见电脑硬件故障有哪些?如何解决?~~~主板故障
  14. win10计算机图标怎么放桌面壁纸,图解win7、win10创意电脑桌面图标摆放的方法
  15. 2020年数据标注行业回顾及2021年展望
  16. npm install 报node-sass错误
  17. 一篇文章带你搞定二维插值的 MATLAB 计算
  18. activiti——结束事件
  19. C#_e.Handled用法
  20. 2.6 Python 基本数据类型

热门文章

  1. C++—— pass by value and use std::move
  2. Django对数据库进行增删查改
  3. Kettle 使用教程(1)—入门
  4. Grab Cut算法
  5. Android之高德地图自定义样式
  6. 头脑王者 艺术,电影,体育,时尚,动漫
  7. Vue监听关闭网页事件
  8. ubuntu16.04下NVIDIA+CUDA+CUDNN+TensorFlow+Pytorch+Opencv等深度学习环境配置
  9. centos7内核升级到4.19以上
  10. 学习-Java常用类之Calendar类