Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes — DoOperator Research

Authors	Junzhe Zhang, Elias Bareinboim
Journal	NeurIPS
Year	2019

What Problem It Solves

Clinical and behavioral treatment policies require sequential decisions under causal constraints, not just high-reward policies in simulator MDPs.

What problem it solves

Clinical and behavioral treatment policies require sequential decisions under causal constraints, not just high-reward policies in simulator MDPs.

The method frames DTR learning as an RL problem while preserving causal identification requirements for treatment effects.

Use for health, medicine, and personalization settings where policies adapt to patient or user history.

Applicability depends on measured histories being rich enough, or on additional causal structure when confounding is hidden.