【AICon】硅谷视野+中国实践,汇聚全球顶尖技术的 AI 科技盛会 >>> 了解详情
写点什么

深度增强学习方向论文整理

  • 2019-11-29
  • 本文字数:5536 字

    阅读完需:约 18 分钟

深度增强学习方向论文整理

一. 开山鼻祖 DQN

  1. Playing Atari with Deep Reinforcement Learning,V. Mnih et al., NIPS Workshop, 2013.

  2. Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.

二. DQN 的各种改进版本(侧重于算法上的改进)

  1. Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.

  2. Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.

  3. Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.

  4. Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI, 2016.

  5. Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.

  6. Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.

  7. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.

  8. Learning functions across many orders of magnitudes,H Van Hasselt,A Guez,M Hessel,D Silver

  9. Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.

  10. State of the Art Control of Atari Games using shallow reinforcement learning

  11. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening(11.13 更新)

  12. Deep Reinforcement Learning with Averaged Target DQN(11.14 更新)

  13. Safe and Efficient Off-Policy Reinforcement Learning(12.20 更新)

  14. The Predictron: End-To-End Learning and Planning (1.3 更新)

三. DQN 的各种改进版本(侧重于模型的改进)

  1. Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.

  2. Deep Attention Recurrent Q-Network

  3. Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.

  4. Progressive Neural Networks

  5. Language Understanding for Text-based Games Using Deep Reinforcement Learning

  6. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

  7. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

  8. Recurrent Reinforcement Learning: A Hybrid Approach

  9. Value Iteration Networks, NIPS, 2016 (12.20 更新)

  10. MazeBase:A sandbox for learning from games(12.20 更新)

  11. Strategic Attentive Writer for Learning Macro-Actions(12.20 更新)

四. 基于策略梯度的深度强化学习

深度策略梯度:


  1. End-to-End Training of Deep Visuomotor Policies

  2. Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

  3. Trust Region Policy Optimization


深度行动者评论家算法:


  1. Deterministic Policy Gradient Algorithms

  2. Continuous control with deep reinforcement learning

  3. High-Dimensional Continuous Control Using Using Generalized Advantage Estimation

  4. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

  5. Deep Reinforcement Learning in Parameterized Action Space

  6. Memory-based control with recurrent neural networks

  7. Terrain-adaptive locomotion skills using deep reinforcement learning

  8. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

  9. SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY(11.13 更新)


搜索与监督:


  1. End-to-End Training of Deep Visuomotor Policies

  2. Interactive Control of Diverse Complex Characters with Neural Networks


连续动作空间下探索改进:


  1. Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks


结合策略梯度和 Q 学习:


  1. Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC(11.13 更新)

  2. PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING(11.13 更新)


其它策略梯度文章:


  1. Gradient Estimation Using Stochastic Computation Graphs

  2. Continuous Deep Q-Learning with Model-based Acceleration

  3. Benchmarking Deep Reinforcement Learning for Continuous Control

  4. Learning Continuous Control Policies by Stochastic Value Gradients

  5. Generalizing Skills with Semi-Supervised Reinforcement Learning(12.20 更新)

五. 分层 DRL

  1. Deep Successor Reinforcement Learning

  2. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

  3. Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks

  4. Stochastic Neural Networks for Hierarchical Reinforcement Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel (11.14 更新)

六. DRL 中的多任务和迁移学习

  1. ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources

  2. A Deep Hierarchical Approach to Lifelong Learning in Minecraft

  3. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

  4. Policy Distillation

  5. Progressive Neural Networks

  6. Universal Value Function Approximators

  7. Multi-task learning with deep model based reinforcement learning(11.14 更新)

  8. Modular Multitask Reinforcement Learning with Policy Sketches (11.14 更新)

七. 基于外部记忆模块的 DRL 模型

  1. Control of Memory, Active Perception, and Action in Minecraft

  2. Model-Free Episodic Control

八. DRL 中探索与利用问题

  1. Action-Conditional Video Prediction using Deep Networks in Atari Games

  2. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks

  3. Deep Exploration via Bootstrapped DQN

  4. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

  5. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

  6. Unifying Count-Based Exploration and Intrinsic Motivation

  7. #Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning(11.14 更新)

  8. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning(11.14 更新)

  9. VIME: Variational Information Maximizing Exploration(12.20 更新)

九. 多 Agent 的 DRL

  1. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

  2. Multiagent Cooperation and Competition with Deep Reinforcement Learning

十. 逆向 DRL

  1. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

  2. Maximum Entropy Deep Inverse Reinforcement Learning

  3. Generalizing Skills with Semi-Supervised Reinforcement Learning(11.14 更新)

十一. 探索+监督学习

  1. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning

  2. Better Computer Go Player with Neural Network and Long-term Prediction

  3. Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.

十二. 异步 DRL

  1. Asynchronous Methods for Deep Reinforcement Learning

  2. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU(11.14 更新)

十三:适用于难度较大的游戏场景

  1. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.

  2. Strategic Attentive Writer for Learning Macro-Actions

  3. Unifying Count-Based Exploration and Intrinsic Motivation

十四:单个网络玩多个游戏

  1. Policy Distillation

  2. Universal Value Function Approximators

  3. Learning values across many orders of magnitude

十五:德州 poker

  1. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

  2. Fictitious Self-Play in Extensive-Form Games

  3. Smooth UCT search in computer poker

十六:Doom 游戏

  1. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning

  2. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning

  3. Playing FPS Games with Deep Reinforcement Learning

  4. LEARNING TO ACT BY PREDICTING THE FUTURE(11.13 更新)

  5. Deep Reinforcement Learning From Raw Pixels in Doom(11.14 更新)

十七:大规模动作空间

  1. Deep Reinforcement Learning in Large Discrete Action Spaces

十八:参数化连续动作空间

  1. Deep Reinforcement Learning in Parameterized Action Space

十九:Deep Model

  1. Learning Visual Predictive Models of Physics for Playing Billiards

  2. J. Schmidhuber, On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv

  3. Learning Continuous Control Policies by Stochastic Value Gradients


4.Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models


  1. Action-Conditional Video Prediction using Deep Networks in Atari Games

  2. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

二十:DRL 应用

机器人领域:


  1. Trust Region Policy Optimization

  2. Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

  3. Path Integral Guided Policy Search

  4. Memory-based control with recurrent neural networks

  5. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

  6. Learning Deep Neural Network Policies with Continuous Memory States

  7. High-Dimensional Continuous Control Using Generalized Advantage Estimation

  8. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

  9. End-to-End Training of Deep Visuomotor Policies

  10. DeepMPC: Learning Deep Latent Features for Model Predictive Control

  11. Deep Visual Foresight for Planning Robot Motion

  12. Deep Reinforcement Learning for Robotic Manipulation

  13. Continuous Deep Q-Learning with Model-based Acceleration

  14. Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

  15. Asynchronous Methods for Deep Reinforcement Learning

  16. Learning Continuous Control Policies by Stochastic Value Gradients


机器翻译:


  1. Simultaneous Machine Translation using Deep Reinforcement Learning


目标定位:


  1. Active Object Localization with Deep Reinforcement Learning


目标驱动的视觉导航:


  1. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning


自动调控参数:


  1. Using Deep Q-Learning to Control Optimization Hyperparameters


人机对话:


  1. Deep Reinforcement Learning for Dialogue Generation

  2. SimpleDS: A Simple Deep Reinforcement Learning Dialogue System

  3. Strategic Dialogue Management via Deep Reinforcement Learning

  4. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning


视频预测:


  1. Action-Conditional Video Prediction using Deep Networks in Atari Games


文本到语音:


  1. WaveNet: A Generative Model for Raw Audio


文本生成:


  1. Generating Text with Deep Reinforcement Learning


文本游戏:


  1. Language Understanding for Text-based Games Using Deep Reinforcement Learning


无线电操控和信号监控:


  1. Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent


DRL 来学习做物理实验:


  1. LEARNING TO PERFORM PHYSICS EXPERIMENTS VIA DEEP REINFORCEMENT LEARNING(11.13 更新)


DRL 加速收敛:


  1. Deep Reinforcement Learning for Accelerating the Convergence Rate(11.14 更新)


利用 DRL 来设计神经网络:


  1. Designing Neural Network Architectures using Reinforcement Learning(11.14 更新)

  2. Tuning Recurrent Neural Networks with Reinforcement Learning(11.14 更新)

  3. Neural Architecture Search with Reinforcement Learning(11.14 更新)


控制信号灯:


  1. Using a Deep Reinforcement Learning Agent for Traffic Signal Control(11.14 更新)


自动驾驶:


  1. CARMA: A Deep Reinforcement Learning Approach to Autonomous Driving(12.20 更新)

  2. Deep Reinforcement Learning for Simulated Autonomous Vehicle Control(12.20 更新)

  3. Deep Reinforcement Learning framework for Autonomous Driving(12.20 更新)

二十一:其它方向

避免危险状态:


  1. Combating Deep Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear (11.14 更新)


DRL 中 On-Policy vs. Off-Policy 比较:


  1. On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning(11.14 更新)


注 1:小伙伴们如果觉得论文一个个下载太麻烦,可以私信我,我打包发给你。


注 2:欢迎大家及时补充新的或者我疏漏的文献。


本文转载自 Alex-zhai 知乎账号。


原文链接:https://zhuanlan.zhihu.com/p/23600620


公众号推荐:

跳进 AI 的奇妙世界,一起探索未来工作的新风貌!想要深入了解 AI 如何成为产业创新的新引擎?好奇哪些城市正成为 AI 人才的新磁场?《中国生成式 AI 开发者洞察 2024》由 InfoQ 研究中心精心打造,为你深度解锁生成式 AI 领域的最新开发者动态。无论你是资深研发者,还是对生成式 AI 充满好奇的新手,这份报告都是你不可错过的知识宝典。欢迎大家扫码关注「AI前线」公众号,回复「开发者洞察」领取。

2019-11-29 13:462389

评论

发布
暂无评论
发现更多内容

告别手写,使用 Doc View 快速生成接口文档

程序员小航

IDEA 插件

特别干的干货!!《Mycat》搭建分布式数据库中间件看他就够

迷彩

mycat 分布式数据库中间件 6月月更

Node.js实用的内置API(二)

devpoint

node.js utils 6月月更

本周二晚19:00战码先锋第6期直播丨共建测试子系统,赋能开发者提高代码质量

OpenHarmony开发者

OpenHarmony

知识管理——知识经济时代的增资利器

小炮

Wallys/Routerboard/DR8072A-HK09/IPQ8072A/802.11ax

wallys-wifi6

802.11AX WIFI 6e

OLAP分析型应用场景中,数仓中vacuum为何对列存表无效

华为云开发者联盟

数据库 后端 存储 华为云

数据平台调度升级改造 | 从Azkaban 平滑过度到 Apache DolphinScheduler 的操作实践

Apache DolphinScheduler

Apache 大数据 开源 workflow

案例驱动 :从入门到掌握Shell编程详细指南

百思不得小赵

bash 运维 6月月更

SREWorks v1.2 版本发布 | 运维市场能力发布

阿里云大数据AI技术

大数据 运维 云原生 开发运维

el-table 分页全选功能讲解

CRMEB

7天免费入门数据智能,“2022数据智能夏令营”开启报名!

个推

人工智能 大数据 数据智能

【LeetCode】出现次数最多的子树元素和Java题解

Albert

LeetCode 6月月更

TiDB 性能分析&性能调优&优化实践大全

TiDB 社区干货传送门

快速玩转CI/CD图形化编排

Jianmu

DevOps 前端 CI/CD 自动化运维 图形化编排

详细视图——基于函数的视图 Django

海拥(haiyong.site)

Python django 6月月更

Webshell检测引擎:青藤开放200个雷火SaaS版免费账号!

青藤云安全

安全攻防 网络安全 攻防演练

APICloud 实现文档下载和预览功能

YonBuilder低代码开发平台

文件 APP开发 APICloud

通过一个具体的例子,讲解 SAP Cloud Platform Integration(CPI) 的使用方法

Jerry Wang

Cloud 系统集成 SAP 6月月更 cpi

我的远程办公经验 | 社区征文

五分钟学大数据

初夏征文

2022年秋季广州美博会-2022广州9月份美博会

Geek_0b38bb

2022年广州美博会 秋季广州美博会 美博会 广州美博会

快速认识 WebAssembly

devpoint

rust webassembly Wasm 6月月更

Java技术培训之设计模式七大原则

@零度

设计模式 JAVA开发

福昕软件重磅发布福昕高级PDF编辑器12.0

联营汇聚

Fabric.js 控制元素层级 👑

德育处主任

前端 canvas Fabric.js 6月月更

一个老开源人的自述-如何干好开源这件事

云智慧AIOps社区

开源 前端 开源项目 数据可视化

大数据培训Flink之Table API 与 SQL

@零度

flink 大数据开发

文旅新体验!3DCAT助力广州非遗“元宇宙”街区炫酷亮相

3DCAT实时渲染

非遗 元宇宙 实时云渲染

低代码实现探索(四十三)前台对象数据树

零道云-混合式低代码平台

强推10款Python常用的开发工具

左手の明天

Python ide python开发工具

Java开发培训之设计模式UML类图

@零度

JAVA开发 UML

深度增强学习方向论文整理_语言 & 开发_Alex-zhai_InfoQ精选文章