Off-Policy Policy Evaluation for Sequential Decisions under Unobserved Confounding — DoOperator Research

Authors	Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill
Journal	arXiv
Year	2020

What Problem It Solves

Importance sampling and doubly robust OPE estimators are invalid if the behavior policy used hidden state unavailable to the evaluator.

What problem it solves

Importance sampling and doubly robust OPE estimators are invalid if the behavior policy used hidden state unavailable to the evaluator.

The paper develops evaluation bounds or estimators that remain informative under controlled departures from sequential ignorability.

Use for offline RL evaluation where logged human or production policies may use private information not in the dataset.

It does not identify exact values without restrictions; the output is only as useful as the confounding model is credible.