深度增强学习方向论文整理_语言 & 开发_Alex-zhai

阿里云「飞天发布时刻」2024来啦！新产品、新特性、新能力、新方案，等你来探~ 了解详情 



 写点什么

一. 开山鼻祖 DQN

Playing Atari with Deep Reinforcement Learning，V. Mnih et al., NIPS Workshop, 2013.
Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.

二. DQN 的各种改进版本（侧重于算法上的改进）

Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.
Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.
Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.
Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI, 2016.
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.
Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.
Learning functions across many orders of magnitudes，H Van Hasselt，A Guez，M Hessel，D Silver
Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.
State of the Art Control of Atari Games using shallow reinforcement learning
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening（11.13 更新）
Deep Reinforcement Learning with Averaged Target DQN（11.14 更新）
Safe and Efficient Off-Policy Reinforcement Learning（12.20 更新）
The Predictron: End-To-End Learning and Planning （1.3 更新）

三. DQN 的各种改进版本（侧重于模型的改进）

Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.
Deep Attention Recurrent Q-Network
Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.
Progressive Neural Networks
Language Understanding for Text-based Games Using Deep Reinforcement Learning
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Recurrent Reinforcement Learning: A Hybrid Approach
Value Iteration Networks, NIPS, 2016 （12.20 更新）
MazeBase：A sandbox for learning from games（12.20 更新）
Strategic Attentive Writer for Learning Macro-Actions（12.20 更新）

四. 基于策略梯度的深度强化学习

深度策略梯度：

End-to-End Training of Deep Visuomotor Policies
Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search
Trust Region Policy Optimization

深度行动者评论家算法：

Deterministic Policy Gradient Algorithms
Continuous control with deep reinforcement learning
High-Dimensional Continuous Control Using Using Generalized Advantage Estimation
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
Deep Reinforcement Learning in Parameterized Action Space
Memory-based control with recurrent neural networks
Terrain-adaptive locomotion skills using deep reinforcement learning
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY（11.13 更新）

搜索与监督：

End-to-End Training of Deep Visuomotor Policies
Interactive Control of Diverse Complex Characters with Neural Networks

连续动作空间下探索改进：

Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks

结合策略梯度和 Q 学习：

Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC（11.13 更新）
PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING（11.13 更新）

其它策略梯度文章：

Gradient Estimation Using Stochastic Computation Graphs
Continuous Deep Q-Learning with Model-based Acceleration
Benchmarking Deep Reinforcement Learning for Continuous Control
Learning Continuous Control Policies by Stochastic Value Gradients
Generalizing Skills with Semi-Supervised Reinforcement Learning（12.20 更新）

五. 分层 DRL

Deep Successor Reinforcement Learning
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks
Stochastic Neural Networks for Hierarchical Reinforcement Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel （11.14 更新）

六. DRL 中的多任务和迁移学习

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
Policy Distillation
Progressive Neural Networks
Universal Value Function Approximators
Multi-task learning with deep model based reinforcement learning（11.14 更新）
Modular Multitask Reinforcement Learning with Policy Sketches （11.14 更新）

七. 基于外部记忆模块的 DRL 模型

Control of Memory, Active Perception, and Action in Minecraft
Model-Free Episodic Control

八. DRL 中探索与利用问题

Action-Conditional Video Prediction using Deep Networks in Atari Games
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks
Deep Exploration via Bootstrapped DQN
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
Unifying Count-Based Exploration and Intrinsic Motivation
#Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning（11.14 更新）
Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning（11.14 更新）
VIME: Variational Information Maximizing Exploration（12.20 更新）

九. 多 Agent 的 DRL

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks
Multiagent Cooperation and Competition with Deep Reinforcement Learning

十. 逆向 DRL

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Maximum Entropy Deep Inverse Reinforcement Learning
Generalizing Skills with Semi-Supervised Reinforcement Learning（11.14 更新）

十一. 探索+监督学习

Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
Better Computer Go Player with Neural Network and Long-term Prediction
Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.

十二. 异步 DRL

Asynchronous Methods for Deep Reinforcement Learning
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU（11.14 更新）

十三：适用于难度较大的游戏场景

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.
Strategic Attentive Writer for Learning Macro-Actions
Unifying Count-Based Exploration and Intrinsic Motivation

十四：单个网络玩多个游戏

Policy Distillation
Universal Value Function Approximators
Learning values across many orders of magnitude

十五：德州 poker

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Fictitious Self-Play in Extensive-Form Games
Smooth UCT search in computer poker

十六：Doom 游戏

ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning
Playing FPS Games with Deep Reinforcement Learning
LEARNING TO ACT BY PREDICTING THE FUTURE（11.13 更新）
Deep Reinforcement Learning From Raw Pixels in Doom（11.14 更新）

十七：大规模动作空间

Deep Reinforcement Learning in Large Discrete Action Spaces

十八：参数化连续动作空间

Deep Reinforcement Learning in Parameterized Action Space

十九：Deep Model

Learning Visual Predictive Models of Physics for Playing Billiards
J. Schmidhuber, On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv
Learning Continuous Control Policies by Stochastic Value Gradients

4.Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

Action-Conditional Video Prediction using Deep Networks in Atari Games
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

二十：DRL 应用

机器人领域：

Trust Region Policy Optimization
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control
Path Integral Guided Policy Search
Memory-based control with recurrent neural networks
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
Learning Deep Neural Network Policies with Continuous Memory States
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
End-to-End Training of Deep Visuomotor Policies
DeepMPC: Learning Deep Latent Features for Model Predictive Control
Deep Visual Foresight for Planning Robot Motion
Deep Reinforcement Learning for Robotic Manipulation
Continuous Deep Q-Learning with Model-based Acceleration
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
Asynchronous Methods for Deep Reinforcement Learning
Learning Continuous Control Policies by Stochastic Value Gradients

机器翻译:

Simultaneous Machine Translation using Deep Reinforcement Learning

目标定位：

Active Object Localization with Deep Reinforcement Learning

目标驱动的视觉导航：

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

自动调控参数：

Using Deep Q-Learning to Control Optimization Hyperparameters

人机对话：

Deep Reinforcement Learning for Dialogue Generation
SimpleDS: A Simple Deep Reinforcement Learning Dialogue System
Strategic Dialogue Management via Deep Reinforcement Learning
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

视频预测：

Action-Conditional Video Prediction using Deep Networks in Atari Games

文本到语音：

WaveNet: A Generative Model for Raw Audio

文本生成：

Generating Text with Deep Reinforcement Learning

文本游戏：

Language Understanding for Text-based Games Using Deep Reinforcement Learning

无线电操控和信号监控：

Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent

DRL 来学习做物理实验：

LEARNING TO PERFORM PHYSICS EXPERIMENTS VIA DEEP REINFORCEMENT LEARNING（11.13 更新）

DRL 加速收敛：

Deep Reinforcement Learning for Accelerating the Convergence Rate（11.14 更新）

利用 DRL 来设计神经网络：

Designing Neural Network Architectures using Reinforcement Learning（11.14 更新）
Tuning Recurrent Neural Networks with Reinforcement Learning（11.14 更新）
Neural Architecture Search with Reinforcement Learning（11.14 更新）

控制信号灯：

Using a Deep Reinforcement Learning Agent for Traffic Signal Control（11.14 更新）

自动驾驶：

CARMA: A Deep Reinforcement Learning Approach to Autonomous Driving（12.20 更新）
Deep Reinforcement Learning for Simulated Autonomous Vehicle Control（12.20 更新）
Deep Reinforcement Learning framework for Autonomous Driving（12.20 更新）

二十一：其它方向

避免危险状态：

Combating Deep Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear （11.14 更新）

DRL 中 On-Policy vs. Off-Policy 比较：

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning（11.14 更新）

注 1：小伙伴们如果觉得论文一个个下载太麻烦，可以私信我，我打包发给你。

注 2：欢迎大家及时补充新的或者我疏漏的文献。

本文转载自 Alex-zhai 知乎账号。

原文链接：https://zhuanlan.zhihu.com/p/23600620

公众号推荐：

跳进 AI 的奇妙世界，一起探索未来工作的新风貌！想要深入了解 AI 如何成为产业创新的新引擎？好奇哪些城市正成为 AI 人才的新磁场？《中国生成式 AI 开发者洞察 2024》由 InfoQ 研究中心精心打造，为你深度解锁生成式 AI 领域的最新开发者动态。无论你是资深研发者，还是对生成式 AI 充满好奇的新手，这份报告都是你不可错过的知识宝典。欢迎大家扫码关注「AI前线」公众号，回复「开发者洞察」领取。

发布

暂无评论

创作场景

深度增强学习方向论文整理

一. 开山鼻祖 DQN

二. DQN 的各种改进版本（侧重于算法上的改进）

三. DQN 的各种改进版本（侧重于模型的改进）

四. 基于策略梯度的深度强化学习

五. 分层 DRL

六. DRL 中的多任务和迁移学习

七. 基于外部记忆模块的 DRL 模型

八. DRL 中探索与利用问题

九. 多 Agent 的 DRL

十. 逆向 DRL

十一. 探索+监督学习

十二. 异步 DRL

十三：适用于难度较大的游戏场景

十四：单个网络玩多个游戏

十五：德州 poker

十六：Doom 游戏

十七：大规模动作空间

十八：参数化连续动作空间

十九：Deep Model

二十：DRL 应用

二十一：其它方向

公众号推荐：

评论

更多内容推荐

推荐阅读

电子书

大厂实战PPT下载