| Authors | Richard S. Sutton, Andrew G. Barto |
| Journal | MIT Press |
| Year | 2018 |
What Problem It Solves
Provides a canonical overview or reference point for the relevant DoOperator research area.
Provides a canonical overview or reference point for the relevant DoOperator research area.
The standard textbook introduction to reinforcement learning, covering MDPs, value functions, temporal-difference learning, policy gradients, and core algorithms.
Use when orienting a new paper, blog post, benchmark, or research plan in this area.
Do not cite the overview as evidence that a specific method works in a specific deployment setting without checking the underlying primary paper.
Related papers
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang, Lihong Li · 2015
PaperA Survey of Constraint Formulations in Safe Reinforcement Learning
Akifumi Wachi, Xun Shen, Yanan Sui · 2024
PaperA Comprehensive Survey on Safe Reinforcement Learning
Javier Garcia, Fernando Fernandez · 2015
PaperNear-Optimal Reinforcement Learning in Dynamic Treatment Regimes
Junzhe Zhang, Elias Bareinboim · 2019