| Authors | Junzhe Zhang, Elias Bareinboim |
| Journal | CausalAI Lab Technical Report R-23 |
| Year | 2016 |
What Problem It Solves
Standard MDP estimation assumes observed state captures the information needed for valid transition and reward learning; unobserved confounders break that assumption.
Standard MDP estimation assumes observed state captures the information needed for valid transition and reward learning; unobserved confounders break that assumption.
The paper formulates sequential decision problems in causal-model terms, clarifying which policy values are identifiable under hidden confounding.
Use as the foundational citation whenever an RL method claims robustness to hidden confounding in sequential decisions.
The formulation highlights identifiability limits; it does not make unobserved confounding disappear without additional structure or data.
Related papers
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
Stefan Wager, Susan Athey · 2017
PaperCausal inference in statistics: An overview
Judea Pearl · 2009
PaperTowards Causal Representation Learning
Bernhard Scholkopf, Francesco Locatello, Stefan Bauer +4 more · 2021
PaperElements of Causal Inference: Foundations and Learning Algorithms
Jonas Peters, Dominik Janzing, Bernhard Scholkopf · 2017