| Authors | Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill |
| Journal | arXiv |
| Year | 2020 |
What Problem It Solves
Importance sampling and doubly robust OPE estimators are invalid if the behavior policy used hidden state unavailable to the evaluator.
Importance sampling and doubly robust OPE estimators are invalid if the behavior policy used hidden state unavailable to the evaluator.
The paper develops evaluation bounds or estimators that remain informative under controlled departures from sequential ignorability.
Use for offline RL evaluation where logged human or production policies may use private information not in the dataset.
It does not identify exact values without restrictions; the output is only as useful as the confounding model is credible.
Related papers
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
Stefan Wager, Susan Athey · 2017
PaperCausal inference in statistics: An overview
Judea Pearl · 2009
PaperTowards Causal Representation Learning
Bernhard Scholkopf, Francesco Locatello, Stefan Bauer +4 more · 2021
PaperElements of Causal Inference: Foundations and Learning Algorithms
Jonas Peters, Dominik Janzing, Bernhard Scholkopf · 2017