理论

文献名 引用信息 备注
Reinforcement learning: An introduction Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018. 入门书籍
Reinforcement Learning Wiering M A, Van Otterlo M. Reinforcement learning[J]. Adaptation, learning, and optimization, 2012, 12(3): 729. 入门书籍
Q-learning Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3): 279-292. Q-Learning算法的收敛性
Convergence of Q-learning: A simple proof Melo F S. Convergence of Q-learning: A simple proof[J]. Institute Of Systems and Robotics, Tech. Rep, 2001: 1-4. Q-Learning算法的收敛性
Human-level control through deep reinforcement learning Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533. 提出了DQN算法
Policy gradient methods for reinforcement learning with function approximation Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in neural information processing systems. 2000: 1057-1063. 提出了Policy Gradient算法
Deterministic Policy Gradient Algorithms Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[C]//International conference on machine learning. PMLR, 2014: 387-395. 提出了DPG算法
Continuous control with deep reinforcement learning Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015. 提出了DDPG算法
Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems Matignon L, Laurent G J, Le Fort-Piat N. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems[J]. The Knowledge Engineering Review, 2012, 27(1): 1-31. 汇总了Multi-Agent RL相较于Single-Agent RL的难点
Multi-agent actor-critic for mixed cooperative-competitive environments Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in neural information processing systems, 2017, 30. 提出了MADDPG算法
Trust region policy optimization Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897. 提出了TRPO算法
Proximal policy optimization algorithms Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017. 提出了PPO算法
Soft Actor-Critic: Off-Policy Entropy Deep Reinforcement Learning with a Stochastic Actor Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML. 提出了Soft Actor-Critic算法
Actor-Attention-Critic for Multi-Agent Reinforcement Learning Iqbal, S., & Sha, F. (2019). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. ICML. 探讨了在强化学习中引入Attention机制
Counterfactual Multi-Agent Policy Gradients Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual Multi-Agent Policy Gradients. AAAI. 提出了COMA算法
Mean Field Multi-Agent Reinforcement Learning Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean Field Multi-Agent Reinforcement Learning. ArXiv, abs/1802.05438. 提出了MFRL算法
A Survey of Multi-Agent Reinforcement Learning with Communication Zhu, C., Dastani, M.M., & Wang, S. (2022). A Survey of Multi-Agent Reinforcement Learning with Communication. ArXiv, abs/2203.08975. 讨论了 MAMDP with communication between agents的研究现状
On Learning Intrinsic Rewards for Policy Gradient Methods Zheng, Z., Oh, J., & Singh, S. (2018). On Learning Intrinsic Rewards for Policy Gradient Methods. Neural Information Processing Systems. 提出了reward shaping,以应对sparse and distractive reward问题

应用

Deep Reinforcement Learning for Internet of Things: A Comprehensive Survey Chen, W., Qiu, X., Cai, T., Dai, H., Zheng, Z., & Zhang, Y. (2021). Deep Reinforcement Learning for Internet of Things: A Comprehensive Survey. IEEE Communications Surveys & Tutorials, 23, 1659-1692. 综述:强化学习的主流算法,强化学习在UAV(unmanned aerial vehicle), MEC(mobile edge computing), packet routing等方面的应用
3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach R. Ding, F. Gao and X. S. Shen, “3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach,” in IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 7796-7809, Dec. 2020, doi: 10.1109/TWC.2020.3016024. DDPG算法应用无人机通信资源分配+路径规划

强化学习重点文献汇总相关推荐

  1. 【原创】强化学习精选资料汇总:从入门到精通,看完这些干货就够啦!

    点击上方,选择星标或置顶,不定期资源大放送! 阅读大概需要8分钟 Follow小博主,每天更新前沿干货 [导读]本文为大家整理了公众号之前发过的一系列强化学习资料和学习手册,包括:强化学习视频课程.经 ...

  2. 2022主流Nivida显卡深度学习/强化学习/AI算力汇总

    2022主流Nivida显卡深度学习/强化学习/AI算力汇总一览表 总结自国外多个网站

  3. 强化学习分类与汇总介绍

    1.强化学习(Reinforcement Learning, RL) 强化学习把学习看作试探评价过程,Agent选择一个动作用于环境,环境接受该动作后状态发生变化,同时产生一个强化信号(奖或惩)反馈给 ...

  4. 必看!52篇深度强化学习收录论文汇总 | AAAI 2020

    所有参与投票的 CSDN 用户都参加抽奖活动 群内公布奖项,还有更多福利赠送 来源 | 深度强化学习实验室(ID:Deep-RL) 作者 | DeepRL AAAI 2020 共收到的有效论文投稿超过 ...

  5. 【重磅】Tensorflow2.0实现29种深度强化学习算法大汇总

    点击上方,选择星标或置顶,不定期资源大放送! 阅读大概需要3分钟 Follow小博主,每天更新前沿干货 来源:深度强化学习实验室 作者:王健树 [导读]今天给大家推荐一个超赞的强化学习项目资料,该项目 ...

  6. AAAI-2020 || 52篇深度强化学习accept论文汇总

    深度强化学习实验室报道 来源:AAAI-2020 作者:DeepRL AAAI 2020 共收到的有效论文投稿超过 8800 篇,其中 7737 篇论文进入评审环节,最终收录数量为 1591 篇,收录 ...

  7. [强化学习]-网络安全资料汇总

    文章目录 Papers Surveys Demonstration papers Position papers Regular Papers PhD Theses Master Theses Bac ...

  8. 83篇文献-万字总结 || 强化学习之路

    深度强化学习实验室报道 作者:侯宇清,陈玉荣 编辑:DeepRL 深度强化学习是深度学习与强化学习相结合的产物,它集成了深度学习在视觉等感知问题上强大的理解能力,以及强化学习的决策能力,实现了端到端学 ...

  9. 强化学习ppt_强化学习和最优控制的十个关键点81页PPT汇总

    深度强化学习实验室报道 来源:book.yunzhan365 作者:DeepRL 在线PDF阅读地址见文章末尾 完整版在线阅读地址: https://book.yunzhan365.com/iths/ ...

最新文章

  1. apple air装双系统(win7)
  2. iOS 控制屏幕横竖屏旋转
  3. ese如何实现支付 nfc_海运费如何实现快捷支付?答案有了
  4. java 开发平台idea_JAVA开发平台intellij idea使用教程:有哪些方法可以实现自动导入...
  5. AJAX(XMLHttpRequest)进行跨域请求方法详解(二)
  6. linux系统各种日志存储路径和详细介绍
  7. 不记得撞得有多痛了,可是,那个电线杆,永远都在
  8. MATLAB 2018a Mac版安装激活教程
  9. 基于STM32音频频谱分析设计方案
  10. 【老生谈算法】matlab实现传染病模型源码——传染病模型
  11. python语言程序设计基础笔记(三)计算机思维
  12. python小孩的报酬_孩子怎么区分报酬与奖励
  13. MFC快速创建bmp图片
  14. 听完周杰伦的《Mojito》,我不禁想用分子料理做几颗
  15. 微信小程序之发送表情和文字和语音之php
  16. 刘润、陈果、董小英、朋新宇、付晓岩等50余位专家力荐《精益数据方法论》重磅上市!...
  17. 清华计算机徐华简介,徐华
  18. 《别看了,你学不会的》——Redis原理与实战(一)
  19. 详解GAN代码之搭建并详解CGAN代码
  20. oracle utl inaddr,Oracle包utl_inaddr

热门文章

  1. MATH-现代=矩阵
  2. 项目周期一般多久_股票解套的时间周期一般多久 股票解套要多长时间
  3. eve-ng复制实验
  4. 飞腾 CPU x 百度昆仑 AI 芯片!自主国产算力时代已经到来!
  5. traceback.print_exc()跟traceback.format_exc()有什么区别
  6. 3.Glide使用之ListAdapter加载图片篇
  7. 如何在电脑上查看自己的蓝牙版本
  8. tensorflow2.3实现卫星图像数据分类(CNN)
  9. 随机快速排序算法(java)
  10. 编码之路,与君共勉!